llmio

🥈Silver

LLMIO is a Go-based LLM load balancer gateway that provides a unified REST API, weight-based scheduling, logging, and a modern management interface. It integrates OpenAI, Anthropic, and Gemini models into a single service for LLM clients like Claude Code, Codex, Gemini CLI, and Cherry Studio.

273170Updated 2w ago

Intermediate30min to implementautomation

Saves ~60 min per use

Quick InstallView Source

git clone https://github.com/atopos31/llmio.git

Works with:

Claude

Overview

About This Skill

How to Use

1. **Install LLMIO:** Download the latest binary from [LLMIO GitHub Releases](https://github.com/llmio/llmio/releases) or install via `go install github.com/llmio/llmio@latest`. Run `llmio --init` to generate a default config file. 2. **Configure Providers:** Edit `config.yaml` to add your API keys for OpenAI, Anthropic, and Google. Set weight-based scheduling (e.g., `weights: { "gpt-4o": 5, "claude-3-5-sonnet": 3, "gemini-1.5-pro": 2 }`). 3. **Enable Logging:** Specify a log file path (e.g., `log_file: /var/log/llmio/llmio.log`) and log level (`log_level: info`). 4. **Start the Gateway:** Run `llmio --config config.yaml` and verify the API is accessible at `http://localhost:8080/v1/chat/completions`. 5. **Test Failover:** Use `curl` to send a test request: `curl -X POST http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'`. Simulate a provider outage by blocking their API endpoint and observe automatic rerouting. **Tips:** - Use the management interface (`http://localhost:8080/admin`) to monitor real-time metrics and adjust weights dynamically. - For production, deploy LLMIO behind a reverse proxy (e.g., Nginx) with HTTPS and rate limiting. - Enable caching for system prompts to reduce costs (e.g., `cache_enabled: true`).

Setup & Installation

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/atopos31/llmio

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Set up and configure LLMIO as a load balancer gateway for [YOUR_LLM_PROVIDERS]. Define weight-based scheduling for [MODEL_A] (weight: [X]), [MODEL_B] (weight: [Y]), and [MODEL_C] (weight: [Z]). Enable logging to [LOG_DESTINATION] and test the unified REST API endpoint at [API_URL]. Verify failover behavior by simulating a provider outage for [PROVIDER_NAME].

Example Output

### LLMIO Load Balancer Configuration Report

**Setup Summary:**
- **Version:** LLMIO v1.2.3 (Go-based)
- **Providers Integrated:** OpenAI (GPT-4o), Anthropic (Claude 3.5 Sonnet), Google (Gemini 1.5 Pro)
- **Weight-Based Scheduling:**
  - GPT-4o: 50% (weight: 5)
  - Claude 3.5 Sonnet: 30% (weight: 3)
  - Gemini 1.5 Pro: 20% (weight: 2)
- **Logging:** Enabled (JSON format) to `/var/log/llmio/llmio.log`
- **API Endpoint:** `http://llm-gateway.internal:8080/v1/chat/completions`

**Test Results:**
1. **Health Check:** All providers responded within 200ms. GPT-4o handled 52% of requests, Claude 30%, and Gemini 18%—aligning with configured weights.
2. **Failover Test:** Simulated an outage for OpenAI by blocking their API. LLMIO automatically rerouted 100% of traffic to Anthropic and Google. No errors were logged, and latency increased by only 15ms.
3. **Management Interface:** Accessed at `http://llm-gateway.internal:8080/admin` to monitor real-time metrics. Observed 472 requests processed in the last hour with an average response time of 187ms.

**Recommendations:**
- Adjust weights to [70, 20, 10] for cost optimization during peak hours.
- Enable caching for repeated prompts (e.g., system messages) to reduce provider costs by ~15%.
- Monitor `/var/log/llmio/llmio.log` for rate-limiting warnings from Anthropic (observed 3 instances in the last 24 hours).

**Next Steps:**
- Deploy this configuration to production and integrate with [YOUR_LLM_CLIENT] (e.g., Claude Code) using the unified API endpoint.
- Set up alerts for provider latency > 500ms or error rates > 5%.
- Schedule a quarterly review to adjust weights based on cost/performance data.