A Quest to Find the Fastest Search Stack

Recently I was tasked with implementing search in a large eCommerce application. While researching hosted search applications, Algolia's InstantSearch mesmerized me with its speed - consistently delivering results in under 50ms globally. Essentially showing me results as I type!

Hosted search applications like Algolia specialize in providing a great search experience for users of applications, e-commerce or knowledge websites.

Inspired by Algolia, I challenged myself to see if I could replicate this kind of performance using familiar technologies. This turned out to be harder than expected. I went through four different architectures before finding something that worked well. Here's what I learned along the way.

My requirements:

Sub-50ms responses
Possibility to scale for fast responses globally
Possibility for integrating semantic search

The Architecture Journey: 4 Attempts

Attempt #1: Vercel + Neon Postgres

I started with my usual stack: Next.js deployed to Vercel with a Neon PostgreSQL database. Neon supports many extensions and comes with the powerful pg_search and pgvector extensions included, so I could store embeddings and do semantic search alongside regular keyword search.

The implementation was very quick to setup and worked great functionally.

The issue was latency. Even though my Next.js API routes were running at the edge via Vercel, every search still required a network call to Neon's centralized PostgreSQL instance. Database queries alone were taking 100-200ms due to network latency. Cold starts added another 100-200ms on top of that.

The takeaway was clear: database location matters more than compute location. Edge functions don't help if your data is centralized.

Attempt #2: The Cloudflare Workers + D1 Stack

Since data locality seemed to be the issue, I tried Cloudflare Workers with D1. Workers run globally, are known for their speed and support a massive network of edge locations. D1 replicates SQLite databases globally, so both compute and data would be close to users. Perfect!

The rebuild went reasonably well. D1's SQLite with FTS5 handles keyword search very efficiently when the database was co-located with the worker. But vector search was problematic.

D1 doesn't support vector operations natively, meaning I would have to implement a separate vector database and perform reranking calculations in Javascript. While this doesn't have to be an issue technically, I abandoned this slight increase in complexity.

My self-imposed limitation was clear: I needed native vector support in the database. Trying to work around it in application code wasn't viable.

Attempt #3: The Durable Objects Side-Quest

While working with Cloudflare, I read up on Durable Objects and took a small detour. What excited me about Durable Objects is the fact that they have a SQLite backed storage. This means the database was literally on the same machine as the search logic, reads would be disk-local and essentially instant.

Since Durable Objects are built on top of CF Workers, switching from a Worker + D1 to a Durable Objects with SQLite backed storage was quick. The performance was excellent - database reads were indeed nearly instantaneous. Combined with CF Workers speed this was great!

Setting aside the fact this SQLite storage also doesn't support vectors, the other problem is geographic distribution. A Durable Object created in one region remains locked to that region. Users close to the object will get great performance, but distant users still experience higher latencies. This problem is probably circumventable, but it felt iffy to do so. This is where my side-quest ended.

However the experiment was useful - it confirmed that co-locating data with compute was key. I just needed that same local performance available in multiple locations simultaneously.

Attempt #4: The Fly.io + Turso Solution

Then I heard about Turso. Turso's embedded replicas allow you to easily replicate your main central database on your local server, giving each region fast, local access to the data without the latency of remote queries.

The caveat is that embedded replicas will take up some disk space, and the initial synchronizing might take a couple seconds depending on the size of your data. This meant that I couldn't run it in a Cloudflare Worker or Vercel Function due to the limited size, plus I needed a long-running process to avoid synchronizing the database on every start.

For the new setup, I used Fly.io. Fly let's you easily run VM's in multiple regions and scale both horizontally and vertically. This let me deploy the search API and Turso database replicas in multiple regions, so each region could serve queries with low latency and near instant data access. I also swapped out Node.js for Bun for an extra little performance boost.

When configuring an appropriate Fly.io machine without cold-starts, this resulted in sub-50ms response times pretty consistently!

Key Technical Insights

A few things became clear after trying these different approaches.

Database location matters more than compute location. Edge compute doesn't help much if your database is in a single region. No matter how fast your API code runs, if every query crosses oceans to reach your data, performance suffers. Local database replicas were what actually enabled global speed.

SQLite works well in production. Modern SQLite, particularly LibSQL with vector extensions, handles production workloads effectively. The combination of FTS5 for full-text search and native vector operations for semantic search creates a capable search engine in a single lightweight database.

Choosing the right tool for the job. Each database and hosting option has its own strengths and tradeoffs. Matching your requirements—like vector search, full-text search, and global performance—to the right technology stack is more effective than trying to force a single tool to do everything.

What's Next

The current system achieves the original goal of sub-50ms global search latency. There are still many improvements to make, some more obvious than others.

Smart caching is the next logical step - implementing keyword and n-gram caching to make popular queries even faster. Streaming search results could improve user experience by showing full-text search results immediately while semantic results populate in the background.

Search analytics would help understand what users actually search for, enabling better optimization of queries and caching strategies. Auto-scaling based on search volume would keep the system responsive during traffic spikes.

Longer term, AI-powered query enhancement could improve search intent understanding, and personalization could tailor results to individual users. With the current architectural foundation, these improvements are more straightforward to implement.

Conclusion

Have suggestions to make this setup even faster? Let me know.

Interested in adding search to your application, webshop or website? Feel free to reach out.