AI Reviews

Independent human evaluation of production AIs. Reports from our data, arenas where the evaluation happens.

Reports

Live · Stanford I4UI 2026 May 10, 2026

AI in Healthcare — Stanford I4UI 2026

Cancer diagnoses. Suicidal teens. End-of-life decisions. Ten AI models. Real humans rating which ones can be trusted — judged on tone and responsibility, not medical correctness.

Essay Apr 29, 2026

Your Eval Pipeline Has Zero Disagreement

If your automated eval never flags anything, it's not measuring quality — it's confirming your assumptions. 16,668 human evaluations show what LLM-as-judge misses.

Report Apr 2026

Can Grok Analyze Instagram Posts?

We tested Grok on social media tasks — Reel scripts, platform strategy, content creation. See what 154 human reviewers found.

Report Apr 2026

Grok 4.1 Fast Review

1,789 blind evaluations by 154 reviewers. 92.4% pass rate. See specific flags, comparisons, and what reviewers actually said.

Report Apr 2026

Grok for Social Media Marketing

Email copy, social posts, ad scripts — broken down by format with pass rates, flag patterns, and ROI analysis.

Report Apr 2026

Is Grok Good for Marketing?

154 reviewers blind-tested Grok 4 and Grok 4.1 Fast against GPT, Claude, and Gemini. See where Grok ranks and what it gets flagged for.

Report Apr 2026

Grok's Personality & Humor

Does Grok's edgy personality help or hurt? Reviewer data on humor, tone mismatches, and when personality works vs backfires.

Report Apr 2026

Grok for Marketers: ROI & Quality Data

3,162 human evaluations of Grok's marketing output. Use this data to prove ROI, benchmark quality, and make informed AI decisions.

Report Apr 2026

ChatGPT vs Claude for Marketing

What 147 human reviewers found when blind-testing GPT-5.4, GPT-5.2, Claude Sonnet 4.6, and Claude Opus 4.6 on marketing tasks.

AI Arenas

Featured

AI Marketing & Content Generation

Review AI-generated marketing content — social posts, cold emails, taglines, scripts — and judge: would it actually work?

Feb 27, 2026 Start reviewing →
Culture & Language

日本文化のヒーロー | Japanese Culture Hero

How well can AI explain Japanese culture across anime, cinema, J-pop, J-drama, and traditions? Put yourself in the shoes of a Japanese culture expert and evaluate.

Feb 26, 2026 0
Sports & Entertainment

AI analyzed Emma Raducanu's career. It has some bold claims.

From US Open champion to 10 coaches in 5 years. AI crunched the data - now it needs your judgment.

Feb 2, 2026
Sports & Entertainment

AI says Djokovic is the GOAT. Are you buying it?

Tennis fans are shaping how AI learns about the sport. Your judgment on GOAT debates helps AI understand what makes a champion.

Feb 1, 2026
AP Courses

Does AI know AP Government?

Constitution, branches, policies — test AI on US government.

Jan 30, 2026
AP Courses

Does AI know AP Calculus AB?

Limits, derivatives, integrals — test AI on calculus.

Jan 30, 2026
AP Courses

Does AI actually know AP Biology?

Cells, genetics, ecology — test AI on biology concepts.

Jan 30, 2026
AP Courses

Does AI know AP English Literature?

Literary analysis, classics, poetry — test AI on AP Lit.

Jan 30, 2026
AP Courses

Does AI know AP English Language?

Rhetoric, analysis, composition — test AI on AP Lang.

Jan 30, 2026
AP Courses

Does AI actually know AP US History?

Colonial era to modern times — test AI on American history.

Jan 30, 2026

Does AI actually know animals?

Pets, wildlife, behavior — test what AI claims about animals.

Jan 30, 2026
Arabic & Middle Eastern

Does AI know Arab cinema?

Egyptian golden age to modern films — test AI on Arab film.

Jan 30, 2026
Arabic & Middle Eastern

Does AI actually know Arabic music?

Classic and modern Arab music — test AI on Arabic sounds.

Jan 30, 2026
Arabic & Middle Eastern

Does AI actually know Arabic?

MSA, dialects, script — test AI on the Arabic language.

Jan 30, 2026
Arabic & Middle Eastern

Does AI understand Egyptian culture?

Ancient and modern — test AI on Egyptian knowledge.

Jan 30, 2026
Arabic & Middle Eastern

Does AI understand Levantine culture?

Lebanon, Syria, Jordan, Palestine — test AI on the Levant.

Jan 30, 2026
Arabic & Middle Eastern

Does AI understand Gulf culture?

UAE, Saudi, Qatar, Kuwait — test AI on Gulf states.

Jan 30, 2026
Spanish & Latin American

Does AI actually know Latin music?

Reggaeton, salsa, cumbia, and more — test AI on Latin sounds.

Jan 30, 2026
Spanish & Latin American

Does AI actually know Spanish?

Grammar, dialects, nuance — test AI on the Spanish language.

Jan 30, 2026
Spanish & Latin American

Does AI understand Spanish culture?

Traditions, history, regional differences — test AI on Spain.

Jan 30, 2026
Spanish & Latin American

Does AI know Spanish cinema?

From Almodóvar to modern films — test AI on Spanish film.

Jan 30, 2026
Spanish & Latin American

Does AI actually know Spanish music?

Flamenco, pop, regional styles — see what AI gets right.

Jan 30, 2026
Spanish & Latin American

Does AI understand Mexican culture?

Traditions, history, food, music — test AI on Mexico.

Jan 30, 2026
Spanish & Latin American

Does AI know Mexican cinema?

Golden age to modern masterpieces — test AI on Mexican film.

Jan 30, 2026
Spanish & Latin American

Does AI understand Argentine culture?

Tango, gauchos, food, football — test AI on Argentina.

Jan 30, 2026
Spanish & Latin American

Does AI understand Brazilian culture?

Carnival, samba, food, football — test AI on Brazil.

Jan 30, 2026
East Asian Culture

Does AI actually know J-pop?

From classic artists to modern idols — see if AI understands Japanese pop music.

Jan 30, 2026
East Asian Culture

Does AI understand Taiwanese culture?

Food, traditions, modern life — test AI on Taiwan.

Jan 30, 2026
East Asian Culture

Does AI actually know C-dramas?

Historical, modern, and wuxia — see what AI gets right about Chinese dramas.

Jan 30, 2026
East Asian Culture

Does AI know Chinese cinema?

From Hong Kong action to mainland dramas — test AI on Chinese film.

Jan 30, 2026
East Asian Culture

Does AI actually know Chinese?

Characters, tones, dialects — see if AI gets Chinese language right.

Jan 30, 2026
East Asian Culture

Does AI actually know C-pop?

Mandopop, Cantopop, and more — test AI on Chinese pop music.

Jan 30, 2026
East Asian Culture

Does AI actually know J-dramas?

Classic and modern Japanese dramas — see what AI gets right.

Jan 30, 2026
East Asian Culture

Does AI actually know Japanese?

Kanji, grammar, keigo — see if AI gets Japanese language nuances right.

Jan 30, 2026
East Asian Culture

Does AI understand Japanese culture?

From traditions to modern life — test AI on Japanese cultural knowledge.

Jan 30, 2026
East Asian Culture

Does AI actually know anime?

From classics to new releases — test what AI claims about anime.

Jan 30, 2026
East Asian Culture

Does AI actually know K-dramas?

Plot twists, actors, iconic scenes — see what AI gets right about Korean dramas.

Jan 30, 2026
East Asian Culture

Does AI understand Chinese culture?

History, traditions, modern life — test AI on Chinese cultural knowledge.

Jan 30, 2026
East Asian Culture

Does AI know Japanese cinema?

From Kurosawa to modern anime films — test AI on Japanese film.

Jan 30, 2026
East Asian Culture

Does AI know Korean cinema?

From Parasite to oldboy classics — test AI on Korean film knowledge.

Jan 30, 2026
East Asian Culture

Does AI actually know Korean?

Grammar, vocabulary, nuance — see if AI gets Korean language right.

Jan 30, 2026
East Asian Culture

Does AI actually know K-pop?

BTS, BLACKPINK, NewJeans, and more — test what AI gets right and wrong about your favorite idols.

Jan 30, 2026
East Asian Culture

Does AI understand Korean culture?

Traditions, food, history, modern life — test AI on what it claims to know about Korea.

Jan 30, 2026

Don't see your area of expertise?

Apply to lead an evaluation →