caret

rouapps/caret
★ 10 stars Rust 📊 Data/Analytics Updated 2mo ago
Terminal tool for inspecting and cleaning large LLM training datasets. Handles JSONL, Parquet, and CSV with memory-mapped I/O, near-duplicate detection, token visualization, dataset linting, and an MCP server.
View on GitHub →

Quick Install

Copy the config for your editor. Some servers may need additional setup — check the README.

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "caret": {
      "command": "cargo",
      "args": [
        "run",
        "--",
        "caret"
      ]
    }
  }
}