Hacker News new | ask | show | jobs
by loxias 1038 days ago
> it is really problem dependent as is throughput. Recall and latency numbers reported in benchmarks are typically on very well curated and structured datasets and average across all queries

This is correct. :) Don't worry, I know enough to not trust any published benchmarks on this topic... (I'm also not your target market. I wrote my first "vector DB" in 2001 for music recognition.)

I still think it's crucial to include just a few more facts though, because otherwise the statement is meaningless.

Consider:

A. "we can find an approximate NN match, euclidean, D=768, N=70000000, under 100ms on a modern laptop"

vs

B. "we can find an approximate NN match, euclidean, D=2, N=70000000, under 100ms on a modern laptop"

vs

C. "we can find an approximate NN match, euclidean, D=768, N=70000000, under 100ms on 1000x modern laptops"

Notice how B and C aren't impressive, they're trivially beatable. :)