Defining the standard for human-AI trust.

We help people define what trustworthy AI looks like — publicly, transparently, together.

AI is everywhere.

Trust is missing.

Who decides if AI gets it right?

0 %

of gen-AI pilots yield no business impact.

Not because the AI is bad — but because trust is hard to build and easy to break.

Source: MIT Report, August 2025 →

⚖️

"May be wrong" isn't accountability.

Companies ship AI with disclaimers instead of responsibility — until lawsuits force settlements.

Source: Character.AI + Google settlement, January 2026 →

public standards for "trustworthy AI."

The EU AI Act is coming. Safety researchers are leaving big labs. Everyone talks about "responsible AI" —

but there's still no public standard for what "trustworthy" actually means.

💡

There's no absolute ground truth for AI output.

Truth is plural and context-dependent.

The answer isn't better benchmarks — it's an evolving process of collecting and attributing human judgments.

Public judgments.
Real attribution.
Evolving standards.

We're building a public platform where anyone can judge AI and have that judgment recorded, credited, and rewarded.

Every evaluation is attributed to a real person. Every judgment is timestamped and auditable. Standards evolve with emerging consensus — not static leaderboards, but living records of human judgment.

HumanJudge is setting the standard for human-AI accountability.

What we stand for

Transparency

All judgments are public — linked to a real person, timestamped, and auditable.

Author's Rights

Evaluators own their work. Your judgments stay attributed to you.

Beyond "One Right Answer"

Reality is context-dependent. We use time-decayed aggregation — evolving consensus, not fixed scores.

Long-term over hype

We build evaluation infrastructure that lasts.

Why we built this

After nearly a decade building production AI solutions across travel tech, financial services, and healthcare — we kept watching the same pattern: trust is hard to build, and it keeps breaking.

Benchmarks didn't help because they measure generic capabilities, not specific use cases. One skeptical user during a Product Hunt launch or enterprise pilot could shape the entire narrative.

That insight led to the GrandJury research paper: time-decayed aggregation, complete traceability, and dynamic rubrics that evolve with emerging consensus.

Arthur Cho

Founder

LinkedIn · Portfolio

Arthur has spent nearly a decade at the intersection of AI products and the humans who use them. He's shipped production AI solutions at scale across travel tech, financial services, healthcare, and legal applications.

Frustrated that AI trust came down to brand reputation rather than evidence, he published the GrandJury research paper proposing a new framework for AI evaluation.

Arthur holds a Master of Applied Data Science from the University of Michigan.