LLMIO is a Go-based LLM load balancer gateway that provides a unified REST API, weight-based scheduling, logging, and a modern management interface. It integrates OpenAI, Anthropic, and Gemini models into a single service for LLM clients like Claude Code, Codex, Gemini CLI, and Cherry Studio.
git clone https://github.com/atopos31/llmio.gitLLMIO is a Go-based gateway that consolidates OpenAI, Anthropic, and Gemini models into a single load-balanced service. It provides a unified REST API compatible with OpenAI Chat Completions, Anthropic Messages, and Gemini Native formats, supporting both streaming and non-streaming requests. The gateway implements weight-based scheduling (random or priority strategies) to distribute traffic across providers based on capabilities like tool calling, structured output, and multimodal support. Built-in features include rate limiting, failure handling, session tracking via TraceID, and comprehensive observability with per-request latency breakdown, token usage logging, and cost calculation. A React-based admin UI allows management of providers, models, associations, and request logs with multi-dimensional search filters.
1. **Install LLMIO:** Download the latest binary from [LLMIO GitHub Releases](https://github.com/llmio/llmio/releases) or install via `go install github.com/llmio/llmio@latest`. Run `llmio --init` to generate a default config file. 2. **Configure Providers:** Edit `config.yaml` to add your API keys for OpenAI, Anthropic, and Google. Set weight-based scheduling (e.g., `weights: { "gpt-4o": 5, "claude-3-5-sonnet": 3, "gemini-1.5-pro": 2 }`). 3. **Enable Logging:** Specify a log file path (e.g., `log_file: /var/log/llmio/llmio.log`) and log level (`log_level: info`). 4. **Start the Gateway:** Run `llmio --config config.yaml` and verify the API is accessible at `http://localhost:8080/v1/chat/completions`. 5. **Test Failover:** Use `curl` to send a test request: `curl -X POST http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'`. Simulate a provider outage by blocking their API endpoint and observe automatic rerouting. **Tips:** - Use the management interface (`http://localhost:8080/admin`) to monitor real-time metrics and adjust weights dynamically. - For production, deploy LLMIO behind a reverse proxy (e.g., Nginx) with HTTPS and rate limiting. - Enable caching for system prompts to reduce costs (e.g., `cache_enabled: true`).
Route LLM requests across multiple providers to optimize cost and latency
Load-balance Claude Code, Cursor, and other AI editor integrations across model providers
Track token usage, latency, and per-request costs across OpenAI, Anthropic, and Gemini
Implement provider failover and rate-limit fallback for reliable LLM service
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/atopos31/llmioCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Set up and configure LLMIO as a load balancer gateway for [YOUR_LLM_PROVIDERS]. Define weight-based scheduling for [MODEL_A] (weight: [X]), [MODEL_B] (weight: [Y]), and [MODEL_C] (weight: [Z]). Enable logging to [LOG_DESTINATION] and test the unified REST API endpoint at [API_URL]. Verify failover behavior by simulating a provider outage for [PROVIDER_NAME].
### LLMIO Load Balancer Configuration Report **Setup Summary:** - **Version:** LLMIO v1.2.3 (Go-based) - **Providers Integrated:** OpenAI (GPT-4o), Anthropic (Claude 3.5 Sonnet), Google (Gemini 1.5 Pro) - **Weight-Based Scheduling:** - GPT-4o: 50% (weight: 5) - Claude 3.5 Sonnet: 30% (weight: 3) - Gemini 1.5 Pro: 20% (weight: 2) - **Logging:** Enabled (JSON format) to `/var/log/llmio/llmio.log` - **API Endpoint:** `http://llm-gateway.internal:8080/v1/chat/completions` **Test Results:** 1. **Health Check:** All providers responded within 200ms. GPT-4o handled 52% of requests, Claude 30%, and Gemini 18%—aligning with configured weights. 2. **Failover Test:** Simulated an outage for OpenAI by blocking their API. LLMIO automatically rerouted 100% of traffic to Anthropic and Google. No errors were logged, and latency increased by only 15ms. 3. **Management Interface:** Accessed at `http://llm-gateway.internal:8080/admin` to monitor real-time metrics. Observed 472 requests processed in the last hour with an average response time of 187ms. **Recommendations:** - Adjust weights to [70, 20, 10] for cost optimization during peak hours. - Enable caching for repeated prompts (e.g., system messages) to reduce provider costs by ~15%. - Monitor `/var/log/llmio/llmio.log` for rate-limiting warnings from Anthropic (observed 3 instances in the last 24 hours). **Next Steps:** - Deploy this configuration to production and integrate with [YOUR_LLM_CLIENT] (e.g., Claude Code) using the unified API endpoint. - Set up alerts for provider latency > 500ms or error rates > 5%. - Schedule a quarterly review to adjust weights based on cost/performance data.
Cloud ETL platform for non-technical data integration
IronCalc is a spreadsheet engine and ecosystem
Get more done every day with Microsoft Teams – powered by AI
Customer feedback management made simple
Enterprise workflow automation and service management platform
Automate your spreadsheet tasks with AI power
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan