How an Independent Benchmark Team Turned 4-of-40 Models Passing Hard QA into a Majority Win by March 2026
https://numberfields.asu.edu/NumberFields/show_user.php?userid=6558944
How an independent benchmarking lab discovered only 4 of 40 models beat coin flip on "hard" questions In late 2025, an independent benchmarking group (OpenBench Labs) published a reproducible evaluation showing that, on a 1,000-item "hard