vllm-mlx

waybarrios/vllm-mlx
★ 587 stars Python 🤖 AI/LLM Updated 1mo ago
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
View on GitHub →

Quick Install

Copy the config for your editor. Some servers may need additional setup — check the README.

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "vllm-mlx": {
      "command": "uvx",
      "args": [
        "vllm-mlx"
      ]
    }
  }
}

Or install with pip: pip install vllm-mlx

Topics

anthropicapple-siliconaudio-processingclaude-codecomputer-visionimage-understandinginferencellmmachine-learningmacosmllmmlxmultimodal-aispeech-to-textstt