As a principal eng, side-stepping a migration and having a good local dev experience is too good of a deal to pass up.
That being said, turbopuffer looks interesting. I will check it out. Hopefully their local dev experience is good
The number of people I know who’ve had unrecoverable shard failures on Qdrant is too high to take it seriously.
The bit about paying for publicity doesn’t bother me.
Edit: I haven’t found anything egregious that the CEO has said, or anything really sketchy. The shard failure warnings look serious, but the issues look closed
https://x.com/generall931/status/1809303448837582850?s=46
There used to be a benchmarking issue with a founder that was particularly egregious but I can’t find it anymore.
The sharding and consensus issues were from around a year and a half ago, so maybe it’s gotten better.
There are just so many options in the space, I don’t know why you’d go with one of the least correct vendors (whether or not the correctness is deception is a different question that I can’t answer)
Works well for the vast majority of our customers (although we get the very occasional complaint about wanting a dev environment that works offline). The dataset sizes for local dev are usually so small that the cost rounds to free.
It's only occasional because the people who care about dev environments that work offline are most likely to just skip you and move on.
For actual developer experience, as well as a number of use cases like customers with security and privacy concerns, being able to host locally is essential.
Fair enough if you don't care about those segments of the market, but don't confuse a small number of people asking about it with a small number of people wanting it.
I'm watching them with excitement. We all learn from each other. There's so much to do.
- start small on a laptop. Going through procurement at companies is a pain
- test things in CI reliably. Outages don’t break builds
- transition from laptop scale to web scale easily with the same API with just a different backend
Otherwise it’s really hard to justify not using S3 vectors here
The current dev experience is to start with faiss for PoCs, move to pgvector and then something heavy duty like one of the Lucene wrappers.
Note that we already have a in-your-own-VPC offering for large orgs with strict security/privacy/regulatory controls.
in many CI environments unit tests don't have network access, it's not purely a price consideration.
(not a turbopuffer customer but I have been looking at it)
I've never seen a hard block on network access (how do you install packages/pull images?) but I am sympathetic to wanting to enforce that unit tests run quickly by minimizing/eliminating RTT to networked services.
We've considered the possibility of a local simulator before. Let me know if it winds up being a blocker for your use case.
You pre-build the images with packages installed beforehand, then use those image offline.
The fact that you haven't come across this kind of setup suggests that your hundreds of CI systems are not representative of the industry as a whole.
(I am familiar with Bazel, but I'll have to save the war stories for another thread. It's not a build tool we see our particular customers using.)
Without that it’s unfortunately a non starter
I appreciate your responsiveness and open mind
I really really enjoy & learn a lot from the mixedbread blog. And they find good stuff to open source (although the product itself is closed). https://www.mixedbread.com/blog
I feel like there's a lot of overlap but also probably a lot of distinction too. Pretty new to this space of products though.
When you're operating at the 100B scale, you're pushing beyond the capacity that most on-prem setups can handle. Most orgs have no choice but to put a 100B workload into the nearest public cloud. (For smaller workloads, considerations are different, for sure.)
I'm not sure how that'd work with the binary quantization phase though. For example, we use Matroyska, and some of the bits matter way more than others, so that might be super painful.
Different vector indexes have very different recall and even different parameters for each dramatically impact this.
HNSW can have very good recall even at high vector counts.
There's also the embedding model, whether you're quantizing, if it's pure rag vs hybrid bm25 / static word embeddings vs graph connections, whether you're reranking etc etc
I was curious given the cloud discussion - a quick search suggests default AWS SSD bandwidth is 250 MB/s, and you can pay more for 1 GB/s. Similar for s3, one http connection is < 100 MB/s, and you can pay for more parallel connections. So the hot binary quantized search index is doing a lot of work to minimize these both for the initial hot queries and pruning later fetches. Very cool!
turbopuffer will also continuously monitor production recall at the per-shard level (or on-demand with https://turbopuffer.com/docs/recall). Perhaps counterintuitively, the global recall will actually be better than the per-shard recall if each shard is asked for its own, local top_k!
What CPU are they using here?