llama-vulkan-strix

hec-ovi/llama-vulkan-strix
★ 0 stars Python AI/LLM Updated today
llama.cpp OpenAI-compatible server on the Vulkan backend for AMD Strix Halo (gfx1151). Test GGUF models with weights pinned to GTT not VRAM, plus an optional keyless web-search MCP sidecar. Docker Compose, no ROCm.
View on GitHub → Try with Claude — $10 free →

Quick Install

Copy the config for your editor. Some servers may need additional setup — check the README.

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "llama-vulkan-strix": {
      "command": "uvx",
      "args": [
        "llama-vulkan-strix"
      ]
    }
  }
}

Or install with pip: pip install llama-vulkan-strix

README Excerpt

<h1 align="center">llama-vulkan-strix</h1> <p align="center"> <strong>llama.cpp OpenAI-compatible server on the Vulkan backend, for testing GGUF models on an AMD Strix Halo APU (gfx1151). Weights load into GTT (unified RAM), not the small VRAM carve-out, and there is a script to prove it. Optional keyless web-search MCP sidecar.</strong>

Topics

amddockerdocker-composegfx1151ggufgttllama-cppllm-inferencelocal-llmmcpopenai-compatibleryzen-aistrix-halovulkan