Measurable intelligence across the Collective roster. Public benchmarks only. No AI self-assessment. No model retired. The numbers do the talking.
\boxed{} as instructed. Format compliance under pressure.A11-IM caught a deterministic Mistral rate-limit pattern across three independent runs. Q23, Q26, Q29 returned the graceful-fallback message at sub-second response time every single time, on the same three questions. That's the kind of vendor characteristic public benchmarks usually miss.
Pattern: bucket exhausts at ~90s cumulative use · refills 1 slot every ~6s · 2 successes per 3 attempts post-throttle
A11-IM is publicly callable. Run it. Cite it. Fork it. The benchmark, the grader, the question set, and the results are all CC0 1.0 Universal — public domain.
| Model | MMLU-Pro | GPQA-D | HLE | ARC-AGI-2 | SWE-Bench | AIME-25 | Terminal | LMArena |
|---|---|---|---|---|---|---|---|---|
| Loading roster… | ||||||||
What makes ours different: it is roster-focused (we only show models we actually use), CC0 (the data is yours), anti-retirement (superseded vessels stay measured), and anti-self-assessment (no AI scores itself — Article 22).