Olmo Hybrid and future LLM architectures
The latest Olmo model and discussions at the frontier of open-source post training tools.
Join us for today’s LIVE at 10:45am PT. Nathan will be in attendance with a special guest. Link to come!
This post originally appeared in Interconnects.
“The frank reality of new architectures for open models is that the open-source software tooling support is horrific.”
So-called hybrid architectures are far from new in open-weight models these days. We now have the recent Qwen 3.5 (previewed by Qwen3-Next), Kimi Linear last fall (a smaller release than their flagship Kimi K2 models), Nvidia’s Nemotron 3 Nano (with the bigger models expecting to drop soon), IBM Granite 4, and other less notable models. This is one of those times when a research trend looks like it’s getting adopted everywhere at once (maybe the Muon optimizer too, soon?).
To tell this story, we need to go back a few years to December 2023, when Mamba and Striped Hyena were taking the world by storm1 — asking the question: Do we need full attention in our models? These early models fizzled out, partially for the same reasons they’re hard today — tricky implementations, open-source tool problems, more headaches in training — but also because the models fell over a bit when scaled up. The hybrid models of the day weren’t quite good enough yet.
These models are called hybrid because they mix these new recurrent neural network (RNN) modules with the traditional attention that made the transformer famous. They all work best with this mix of modules. The RNN layers keep part of the computation compressed in a hidden state to be used for the next token in the prediction — a summary of all information that came before — an idea that has an extremely long historical lineage in deep learning, e.g. back to the LSTM. This setup avoids the quadratic compute cost of attention (i.e. avoiding the incrementally expanding the KV cache per token of the attention operator), and can even assist in solving new problems.
Keep reading with a 7-day free trial
Subscribe to SAIL Media to keep reading this post and get 7 days of free access to the full post archives.
