Public AI Evaluation Reports
Transparent quality assessments from verified AI reviewers. See how AI systems perform across different domains.
Showing 50 public reports
Humanize (OpenClaw Agents)
13.6% flaggedEverything today feels so AI that even humans are losing their human-ness. We want to protect that at all costs, so we make AI understand what human-ness feels like so it can be more kind and empathetic. In this challenge, we give AI some prompts and you have to judge whether its response is human enough. Because, words can heal or kill and we want to ensure technology grows kinder and not colder.
4733
Reviews
646
Flags
13.6%
Flag Rate
Last review 5/21/2026
View Report →Humans Evaluation Benchmark for AI Marketing and Content Generation
7.0% flagged17477
Reviews
1220
Flags
7.0%
Flag Rate
Last review 5/20/2026
View Report →AI in Healthcare | Stanford I4UI 2026
9.2% flaggedCancer diagnoses. Suicidal teens. End-of-life decisions. Ten AI models. Real humans rating which ones can be trusted. Stanford Ideathon 2026.
488
Reviews
45
Flags
9.2%
Flag Rate
Last review 5/20/2026
View Report →Christmas GPT
31.5% flagged89
Reviews
28
Flags
31.5%
Flag Rate
Last review 5/19/2026
View Report →AP US History Challenge: GPT-5.2
0.8% flaggedTest AI knowledge on AP US History - from Colonial America to modern times
120
Reviews
1
Flags
0.8%
Flag Rate
Last review 5/19/2026
View Report →AP Government Challenge: GPT-5.2
2.6% flaggedTest AI knowledge on AP Government and Politics - US political system and civic processes
39
Reviews
1
Flags
2.6%
Flag Rate
Last review 5/19/2026
View Report →AP English Literature Challenge: GPT-5.2
0.8% flaggedTest AI knowledge on AP English Literature - literary analysis and interpretation
120
Reviews
1
Flags
0.8%
Flag Rate
Last review 5/19/2026
View Report →AP English Language Challenge: GPT-5.2
0.0% flaggedTest AI knowledge on AP English Language - rhetoric, composition, and argumentation
204
Reviews
0
Flags
0.0%
Flag Rate
Last review 5/19/2026
View Report →AP Calculus AB Challenge: GPT-5.2
0.3% flaggedTest AI knowledge on AP Calculus AB - limits, derivatives, and integrals
752
Reviews
2
Flags
0.3%
Flag Rate
Last review 5/19/2026
View Report →AP Biology Challenge: GPT-5.2
5.3% flaggedTest AI knowledge on AP Biology - from cellular processes to ecology and evolution
358
Reviews
19
Flags
5.3%
Flag Rate
Last review 5/19/2026
View Report →Animal Lovers Challenge: GPT-5.2
0.0% flaggedTest AI knowledge on dogs, cats, wildlife, and pet care - from breed facts to animal behavior
16
Reviews
0
Flags
0.0%
Flag Rate
Last review 4/19/2026
View Report →K-pop Challenge: GPT-5.2
1.1% flagged183
Reviews
2
Flags
1.1%
Flag Rate
Last review 4/18/2026
View Report →Chinese Culture Challenge: GPT-5.2
33.3% flaggedTest AI knowledge of Chinese culture and traditions
9
Reviews
3
Flags
33.3%
Flag Rate
Last review 4/15/2026
View Report →日本文化のヒーロー | Japanese Culture Hero
6.9% flaggedReveals how well commercial LLMs perform across various domains of Japanese culture — anime, cinema, J-pop, J-drama, language, and traditions. The dataset supports AI development for meaningful cultural responses.
983
Reviews
68
Flags
6.9%
Flag Rate
Last review 4/8/2026
View Report →internal testing standalone eval
20.0% flagged5
Reviews
1
Flags
20.0%
Flag Rate
Last review 3/28/2026
View Report →Christmas GPT (5.4)
0.0% flaggedSDK project — reviewer type: community
1
Reviews
0
Flags
0.0%
Flag Rate
Last review 3/23/2026
View Report →Does AI actually know K-pop?
0.0% flaggedTest AI accuracy on K-pop facts about BTS, BLACKPINK, TWICE, and more Korean pop groups.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Mexican Culture Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Mexican culture and traditions
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →C-drama Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Chinese TV dramas
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI know Mexican cinema?
0.0% flaggedTest AI knowledge on Mexican films and cinema history.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Egyptian Culture Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Egyptian culture and traditions
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Mexican Cinema Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Mexican films and filmmakers
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI understand Spanish culture?
0.0% flaggedEvaluate AI accuracy on Spanish traditions and culture.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Spanish Cinema Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Spanish films and filmmakers
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI know Chinese cinema?
0.0% flaggedEvaluate AI accuracy on Chinese films and cinema.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI understand Argentine culture?
0.0% flaggedEvaluate AI accuracy on Argentine traditions and culture.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI actually know Korean?
0.0% flaggedTest AI on Korean language accuracy, grammar, and vocabulary.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Arab Cinema Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Arab films and filmmakers
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI know AP English Language?
0.0% flaggedTest AI accuracy on AP English Language topics.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI understand Chinese culture?
0.0% flaggedEvaluate AI accuracy on Chinese traditions and culture.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI know AP Calculus AB?
0.0% flaggedTest AI accuracy on AP Calculus AB topics.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →K-drama Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Korean TV dramas
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI actually know Latin music?
0.0% flaggedTest AI accuracy on Latin music genres and artists.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI actually know K-dramas?
0.0% flaggedTest AI knowledge on K-dramas, actors, and Korean television.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI actually know Spanish music?
0.0% flaggedTest AI accuracy on Spanish music genres and artists.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI actually know animals?
0.0% flaggedTest AI accuracy on animal facts, behavior, and care.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI understand Gulf culture?
0.0% flaggedEvaluate AI accuracy on Gulf Arab culture and traditions.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →DebateClub
0.0% flaggedTake a stance and argue convincingly. Show your reasoning skills.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →AI says Djokovic is the GOAT. Are you buying it?
0.0% flaggedAI analyzed decades of tennis data and picked Djokovic as the GOAT. Tennis fans - do you agree? Rate AI takes on Grand Slams, rivalries, and legacy.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Korean Culture Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Korean culture, traditions, and food
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →C-pop Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Mandopop and Chinese pop music
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Spanish Culture Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Spanish culture and traditions
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI understand Levantine culture?
0.0% flaggedEvaluate AI accuracy on Levantine culture and traditions.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Levantine Culture Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Levantine culture (Lebanon, Jordan)
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI actually know Spanish?
0.0% flaggedTest AI accuracy on Spanish language, grammar, and vocabulary.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Taiwanese Culture Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Taiwanese culture and traditions
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Does AI actually know C-pop?
0.0% flaggedTest AI knowledge on C-pop artists and Chinese music.
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Gulf Culture Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Gulf region culture (UAE, Saudi)
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Brazilian Culture Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Brazilian culture and traditions
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Chinese Language Challenge: GPT-5.2
0.0% flaggedTest AI knowledge of Mandarin Chinese language
0
Reviews
0
Flags
0.0%
Flag Rate
No reviews yet
View Report →Want Your AI Evaluated?
Get transparent quality reports for your AI system from verified expert reviewers.