Vector Database benchmark with 1536/768 dim data

Y	Hacker News new \| ask \| show \| jobs

	Vector Database benchmark with 1536/768 dim data (github.com)
	26 points by liliuleo93 1044 days ago

4 comments

NickGerleman 1043 days ago

Beyond the other grains of salt, it seems awfully inorganic that the same user advertises this database 6 times as their only submission history.

link

liliuleo93 1043 days ago

I apologize for the repeated posts. The reason I posted it 6 times is that I aim to announce every release and significant commit. The reason my entire history is centered around this benchmark is because I wanted to introduce my project to the community, potentially with some bias. I began my Hacker News journey at that time and wanted to share what I was working on.

link

hashtag-til 1043 days ago

Generally these, say, too "proactive" moves to artificially gain attention to your own GitHub projects makes me less likely to test it out, so I'd rather stay with the mainstream options.

link

VoidWhisperer 1043 days ago

Yeah.. this combined with the fact that this benchmark happens to rank their cloud offering the highest by a wide margin sounds a bit like they are submitting it to market themselves.

link

binarymax 1043 days ago

The reason https://ann-benchmarks.com is so good, is that we can see a plot of recall vs latency. I can see you have some latency numbers in the leaderboard at the bottom, but it's very difficult to make a decision.

As a practitioner that works with vector databases every day, just latency is meaningless to me, because I need to know if it's fast AND accurate, and what the tradeoff is! You can't have it both ways. So it would be helpful if you showed plots showing this tradeoff, similar to ann-benchmarks.

link

liliuleo93 1043 days ago

Thanks for your suggestion and this is a super good question. I was asked some times and please allow me quote one of my response in the repo

" With respect to recall vs Performance, your idea is indeed correct. However, several reasons have guided us to our current approach:

1. We are not solely benchmarking open-source systems; we are also focusing on cloud services. Some of these services, such as Zilliz and Pinecone, don't allow users to customize their parameters to tune the recall, aiming to simplify their usage. Consequently, creating a recall vs Performance graph is not feasible. Also this benchmark allow users to customize their parameters for systems allowing tuning to get their own result to do comparison.

2. There already exists a number of benchmarks doing what you've suggested, which target individuals with ANN search backgrounds. Our goal is to make this benchmark as straightforward as possible and to assist people who lack understanding about the inner workings of each system.

3. Concerning reproducibility, generating a recall vs QPS graph that you mentioned, would require conducting a multitude of tests to obtain enough data points, which considerably reduces reproducibility. "

the link is: https://github.com/zilliztech/VectorDBBench/issues/200#issue...

link

1ba9115454 1043 days ago

The vendors performance metrics rate their own product the highest marks.

link

DougBTX 1043 days ago

Previous discussions from last time they posted this: https://news.ycombinator.com/item?id=36856815

If they’re going to rank themselves so much higher than their competition, they might as well call that out up front and explain why the discrepancy is so large.

link

marginalia_nu 1043 days ago

It's really hard to benchmark this sort of a thing. There are so many layers of caching and external factors that play into it, from all manner of sources including the operating system load and configuration, disk firmware, hardware configuration, and so forth; and the harder you try to isolate these effects, the farther you get from a realistic benchmark because all the factors that were removed are affecting real world performance in a big way.

This is a big reason why for a long time many large DBMS-providers had clauses in their licenses prohibiting 3rd party benchmarks. You can fairly easily construct a benchmark that makes any given DBMS seem great or awful, and there's no such thing as an objective test.

link

liliuleo93 1043 days ago

Fully agree with this idea. All tricks can be a real world strategy and it is impossible for anyone to claiming that they have an absolute fair benchmark.

So the only way we can do to approaching it is to provide more real-world-like cases and forget all tricks vendors might play inside their systems.

Also, people will concern the representative of the cases benchmarks provide. So we plan to make this benchmark more like a framework to support customized cases in the next step.

link

liliuleo93 1043 days ago

Yes, of course vendors have bias. But IMHO, if a benchmark is reproducible and the use cases can match users' needs, then we can say it can somehow help decision making.

link

redskyluan 1043 days ago

The inclusion of the OpenAI dataset in this benchmark adds a layer of realism that's often missing in standardized tests with datasets like SIFT and DEEP

link