★ 587 stars
Python
🤖 AI/LLM
Updated 2mo ago
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
View on GitHub →
Try with Claude — $10 free →
Quick Install
Copy the config for your editor. Some servers may need additional setup — check the README.
Claude Desktop
Claude Code
Cursor
Add to claude_desktop_config.json:
{
"mcpServers": {
"vllm-mlx": {
"command": "uvx",
"args": [
"vllm-mlx"
]
}
}
}
📋 Copy
Run in terminal:
claude mcp add vllm-mlx uvx vllm-mlx
📋 Copy
Add to .cursor/mcp.json:
{
"mcpServers": {
"vllm-mlx": {
"command": "uvx",
"args": [
"vllm-mlx"
]
}
}
}
📋 Copy
Or install with pip: pip install vllm-mlx
Topics
anthropic apple-silicon audio-processing claude-code computer-vision image-understanding inference llm machine-learning macos mllm mlx multimodal-ai speech-to-text stt