OpenAI-compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s.
OpenAI-compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s.
pip install vllm-mlxAdd this configuration to your claude_desktop_config.json:
{
"mcpServers": {
"waybarrios-vllm-mlx-github": {
"command": "uvx",
"args": [
"pip install vllm-mlx"
]
}
}
}Restart Claude Desktop, then ask:
"What tools do you have available from vllm mlx?"
No configuration required. This server works out of the box.
"What resources are available in vllm mlx?"
Claude will query available resources and return a list of what you can access.
"Show me details about [specific item] in vllm mlx"
Claude will fetch and display detailed information about the requested item.
"Create a new [item] in vllm mlx with [details]"
Claude will use the appropriate tool to create the resource and confirm success.
We build custom MCP integrations for B2B companies. From simple connections to complex multi-tool setups.