Query human AI evaluations in Claude Code — without leaving your IDE

Add HumanJudge as a remote MCP server in Claude Code (VS Code or CLI). Ask Claude to compare models, surface flag patterns, or check AI output quality — backed by 16,668+ blind human reviews. No install. Free.

You're building an AI feature in Claude Code. Halfway through, you stop and wonder: should this be GPT-5.4 or Claude Opus? Will users hate the tone? Has anyone actually tested this kind of output with humans?

Most of the time you switch tabs to Google, get vague answers, and switch back. The HumanJudge MCP server gives Claude Code 5 native tools that query 16,668 blind human evaluations of 15 AI models directly. Ask in chat, get the answer, keep building.

Who this is for

  • AI engineers picking models for production features
  • Full-stack developers building AI-powered products who need a sanity check on output quality
  • Anyone using Claude Code to write code that calls LLMs and wants real evaluation data while they build

What you can ask while you code

Question What Claude returns (via MCP)
"Which model should I default to for marketing copy?"Ranked leaderboard with pass rates from 16K+ reviews
"Compare GPT-5.4 and Claude Opus head to head"Side-by-side stats, where they differ, real reviewer quotes
"What flag patterns does Grok have most of?"Top flag categories with examples
"Check this LLM output: [paste]"Semantic match against 913 evaluated outputs
"What's the most-flagged AI model right now?"Live feed of recent verdicts

Setup (1 minute)

Add HumanJudge to your project's .mcp.json file:

.mcp.json
{
  "mcpServers": {
    "humanjudge": {
      "type": "http",
      "url": "https://api.humanjudge.com/mcp"
    }
  }
}
  1. 1

    Add the config above to your project's .mcp.json

    Create the file at your project root if it doesn't exist. If you have other MCP servers, add "humanjudge" alongside them.

  2. 2

    Reload your editor

    VS Code: Cmd+Shift+P → "Reload Window". CLI: restart the session.

  3. 3

    Click "needs auth" next to humanjudge

    In the MCP servers list, you'll see humanjudge with a "needs auth" label. Click it.

  4. 4

    Sign in with your HumanJudge account

    Browser opens — sign in (free account, 30 seconds) and click Allow.

🔔 Works for both the VS Code extension and the CLI — they share the same .mcp.json config.

Example questions to try right now

"Which AI model should I use for marketing emails in my app?"

Ranked leaderboard with pass rates from real human reviewers.

"Compare GPT-5.4 vs Claude Opus 4.6 for ad copy"

Side-by-side stats with reviewer quotes from real ad-copy evaluations.

"What flag patterns are most common in Grok's outputs?"

Flag category breakdown with frequency.

"Check this output from my AI: [paste]"

Semantic match against 913 evaluated outputs.

FAQ

How do I add an MCP server to Claude Code?

Add it to your project's .mcp.json file with type 'http' and the server URL. Reload VS Code (or restart the CLI), click 'needs auth' next to the server name, and complete OAuth in the browser.

Does this work in both VS Code extension and CLI?

Yes. Both use the same .mcp.json config file at your project root. Adding the entry once enables it for both surfaces.

Do I need to install anything?

No. Remote MCP — just config + auth. No npm install, no pip install, no local server. Claude Code talks to api.humanjudge.com directly.

Is the HumanJudge MCP server free?

Yes, the server is free. You'll need a Claude account with MCP access (varies by plan) and a free HumanJudge account for OAuth.

What MCP tools does Claude get?

5 tools: get_model_scores (leaderboard), compare_models (side-by-side), get_flags (flag patterns), check_content (semantic match against evaluated outputs), get_latest (recent verdicts).

Can I use this while writing AI code?

Yes — that's the point. While building an AI feature, ask Claude 'which model has the lowest flag rate for marketing emails?' or 'check this prompt response against real human evaluations' and get answers without leaving your IDE.

Is the data live?

Yes. Each tool call hits the live HumanJudge API. Claude sees what's true right now, not a stale snapshot.

Want this in your other tools?

Last updated: 2026-05-10 · Data refreshes live from humanjudge.com