SAIL Media

SAIL Media

(Ahead of AI) Beyond Standard LLMs

Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers

Sebastian Raschka, PhD
Nov 04, 2025
∙ Paid

From DeepSeek R1 to MiniMax-M2, the largest and most capable open-weight LLMs today remain autoregressive decoder-style transformers, which are built on flavors of the original multi-head attention m…

Keep reading with a 7-day free trial

Subscribe to SAIL Media to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2026 SAIL media, LLC · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture