SAIL Media

SAIL Media

Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

Latent.Space
Mar 13, 2026
∙ Paid
This post originally appeared in Latent Space.

“Models can learn to reason, but they can’t compress the world’s knowledge into a few terabytes of weights.”

Turbopuffer came out of a reading app.

In 2022, Simon was helping his friends at Readwise scale their infra for a highly requested feature: article recommendations and semantic search. Readwise was paying ~$5k/month for their relational database and vector search would cost ~$20k/month making the feature too expensive to ship. In 2023 after mulling over the problem from Readwise, Simon decided he wanted to “build a search engine” which became Turbopuffer.

Turbopuffer helping Readwise today - https://turbopuffer.com/customers/readwise

We discuss:
• Simon’s path: Denmark → Shopify infra for nearly a decade → “angel engineering” across startups like Readwise, Replicate, and Causal → turbopuffer almost accidentally becoming a company
• The Readwise origin story: building an early recommendation engine right after the ChatGPT moment, seeing it work, then realizing it would cost ~$30k/month for a company spending ~$5k/month total on infra and getting obsessed with fixing that cost structure
• Why turbopuffer is “a search engine for unstructured data”: Simon’s belief that models can learn to reason, but can’t compress the world’s knowledge into a few terabytes of weights, so they need to connect to systems that hold truth in full fidelity
• The three ingredients for building a great database company: a new workload, a new storage architecture, and the ability to eventually support every query plan customers will want on their data
• The architecture bet behind turbopuffer: going all in on object storage and NVMe, avoiding a traditional consensus layer, and building around the cloud primitives that only became possible in the last few years

X avatar for @Sirupsen
Simon Eskildsen@Sirupsen
our tiered storage engine keeps getting better, seamlessly letting you navigate cost/latency tradeoffs. not benchmarks, but in production, on real workloads. query once in a while? Object storage query sometimes? NVMe query a lot? NVMe/memory
X avatar for @turbopuffer
turbopuffer @turbopuffer
production latency is all that matters, big drops over the past few weeks 📉 big things in the pipes to make turbopuffer even faster. (p99 are cold queries completely from object storage)
3:14 PM · Apr 19, 2024 · 8.26K Views

3 Reposts · 43 Likes

Keep reading with a 7-day free trial

Subscribe to SAIL Media to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
Latent.Space's avatar
A guest post by
Latent.Space
Writer, curator, latent space explorer. Main blog: https://swyx.io Devrel/Dev community: https://dx.tips/ Twitter: https://twitter.com/swyx
Subscribe to Latent.Space
© 2026 SAIL media, LLC · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture