LLMIO is a Go-based LLM load balancer gateway that provides a unified REST API, weight-based scheduling, logging, and a modern management interface. It integrates OpenAI, Anthropic, and Gemini models into a single service for LLM clients like Claude Code, Codex, Gemini CLI, and Cherry Studio.
git clone https://github.com/atopos31/llmio.gitLLMIO is a Go-based LLM load balancer gateway that provides a unified REST API, weight-based scheduling, logging, and a modern management interface. It integrates OpenAI, Anthropic, and Gemini models into a single service for LLM clients like Claude Code, Codex, Gemini CLI, and Cherry Studio.
1. **Install LLMIO:** Download the latest binary from [LLMIO GitHub Releases](https://github.com/llmio/llmio/releases) or install via `go install github.com/llmio/llmio@latest`. Run `llmio --init` to generate a default config file. 2. **Configure Providers:** Edit `config.yaml` to add your API keys for OpenAI, Anthropic, and Google. Set weight-based scheduling (e.g., `weights: { "gpt-4o": 5, "claude-3-5-sonnet": 3, "gemini-1.5-pro": 2 }`). 3. **Enable Logging:** Specify a log file path (e.g., `log_file: /var/log/llmio/llmio.log`) and log level (`log_level: info`). 4. **Start the Gateway:** Run `llmio --config config.yaml` and verify the API is accessible at `http://localhost:8080/v1/chat/completions`. 5. **Test Failover:** Use `curl` to send a test request: `curl -X POST http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'`. Simulate a provider outage by blocking their API endpoint and observe automatic rerouting. **Tips:** - Use the management interface (`http://localhost:8080/admin`) to monitor real-time metrics and adjust weights dynamically. - For production, deploy LLMIO behind a reverse proxy (e.g., Nginx) with HTTPS and rate limiting. - Enable caching for system prompts to reduce costs (e.g., `cache_enabled: true`).
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/atopos31/llmioCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Set up and configure LLMIO as a load balancer gateway for [YOUR_LLM_PROVIDERS]. Define weight-based scheduling for [MODEL_A] (weight: [X]), [MODEL_B] (weight: [Y]), and [MODEL_C] (weight: [Z]). Enable logging to [LOG_DESTINATION] and test the unified REST API endpoint at [API_URL]. Verify failover behavior by simulating a provider outage for [PROVIDER_NAME].
### LLMIO Load Balancer Configuration Report **Setup Summary:** - **Version:** LLMIO v1.2.3 (Go-based) - **Providers Integrated:** OpenAI (GPT-4o), Anthropic (Claude 3.5 Sonnet), Google (Gemini 1.5 Pro) - **Weight-Based Scheduling:** - GPT-4o: 50% (weight: 5) - Claude 3.5 Sonnet: 30% (weight: 3) - Gemini 1.5 Pro: 20% (weight: 2) - **Logging:** Enabled (JSON format) to `/var/log/llmio/llmio.log` - **API Endpoint:** `http://llm-gateway.internal:8080/v1/chat/completions` **Test Results:** 1. **Health Check:** All providers responded within 200ms. GPT-4o handled 52% of requests, Claude 30%, and Gemini 18%—aligning with configured weights. 2. **Failover Test:** Simulated an outage for OpenAI by blocking their API. LLMIO automatically rerouted 100% of traffic to Anthropic and Google. No errors were logged, and latency increased by only 15ms. 3. **Management Interface:** Accessed at `http://llm-gateway.internal:8080/admin` to monitor real-time metrics. Observed 472 requests processed in the last hour with an average response time of 187ms. **Recommendations:** - Adjust weights to [70, 20, 10] for cost optimization during peak hours. - Enable caching for repeated prompts (e.g., system messages) to reduce provider costs by ~15%. - Monitor `/var/log/llmio/llmio.log` for rate-limiting warnings from Anthropic (observed 3 instances in the last 24 hours). **Next Steps:** - Deploy this configuration to production and integrate with [YOUR_LLM_CLIENT] (e.g., Claude Code) using the unified API endpoint. - Set up alerts for provider latency > 500ms or error rates > 5%. - Schedule a quarterly review to adjust weights based on cost/performance data.
Cloud ETL platform for non-technical data integration
IronCalc is a spreadsheet engine and ecosystem
Get more done every day with Microsoft Teams – powered by AI
Customer feedback management made simple
Enterprise workflow automation and service management platform
Automate your spreadsheet tasks with AI power
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan