ocr-pipeline

jacobhgruber-dev/ocr-pipeline
★ 1 stars Python AI/LLM Updated today
Multi-engine OCR with VLM merge for PDF documents. Marker, Surya 2, Mathpix, Google Doc AI + Gemini/Claude merge. CLI, library API, and MCP server.
View on GitHub → Try with Claude — $10 free →

Quick Install

Copy the config for your editor. Some servers may need additional setup — check the README.

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "ocr-pipeline": {
      "command": "uvx",
      "args": [
        "ocr-pipeline"
      ]
    }
  }
}

Or install with pip: pip install ocr-pipeline

README Excerpt

**Multi-engine OCR with VLM merge for PDF documents.** Three OCR engines run in parallel on each page. A Vision Language Model (Gemini or Claude) reads the page image and all engine outputs, then writes a single clean markdown transcription — correcting errors, resolving disagreements, and preserving document structure.

Tools (16)

api_timeout_secbudget_cap_usdcheckpoint_dircolumn_layoutcontent_typeengine_cost_per_pageenginesinput_dirlanguagesmarker_concurrencymax_retriesmax_workersoutput_dirrender_dpiretry_base_delay_secretry_max_delay_sec