A benchmarking controversy exposes industry-wide problems when it turns out OpenAI helped design the test that its vaunted o3 ...