mcp-llm-eval

berkayildi/mcp-llm-eval
★ 0 stars Python 🤖 AI/LLM Updated today
MCP server for LLM evaluation gates.
View on GitHub →

Quick Install

Copy the config for your editor. Some servers may need additional setup — check the README.

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "mcp-llm-eval": {
      "command": "uvx",
      "args": [
        "mcp-llm-eval"
      ]
    }
  }
}

Or install with pip: pip install mcp-llm-eval

README Excerpt

A local **Model Context Protocol (MCP) server** that packages LLM evaluation gates as reusable CI/CD primitives. Run datasets against multiple models, score responses with an LLM-as-judge, and enforce quality thresholds — all through MCP tools that AI agents can call. ```mermaid flowchart LR A[PR opened] --> B[Run dataset<br/>through models]

Tools (6)

check_thresholdscompare_runsformat_pr_commentget_evaluationlist_evaluationsrun_evaluation

Topics

benchmarkcicdevaluationllmmcp