Reading today's open-closed performance gap

The complex factors that determine the single evaluation number so many focus on. Plus, how this changes in the future.

Apr 23, 2026

∙ Paid

This post originally appeared in Interconnects.

“If this direct data access becomes the next frontier of training, open models in their current form will be left behind.”

It’s a clear, current equilibrium that open models will be in perpetual catch-up of closed models, but this gap being viewed as a single number, a “distance”, covers up a nuanced and crucial dynamic at what capabilities the models are covering. The most popular benchmark to comment on this gap is the Artificial Analysis Intelligence Index — a composite benchmark of ~10 sub-evals that they maintain over time to capture the “frontier” of current language model capabilities.

Particularly, I spend a lot of time understanding how dynamics that feed into that index are misunderstood by the natural tendency to reduce performance and trends to one number. Examples include:

How benchmarks evolve over time, becoming more or less correlated with how people actually use models,
How different models’ real-world performance relates to their benchmark rankings, and
How training regimes evolve over time to move said benchmarks.

Keep reading with a 7-day free trial

Subscribe to SAIL Media to keep reading this post and get 7 days of free access to the full post archives.