How does performance scale (vs pgvector) when you have an index and start loading data in parallel? Or how does this scale vs the to-be-released pgvector 0.5.2?
> Operator <-> can only be used inside of an index
Isn't the use of the distance operator in scan+sort critical for generating the expected/correct result that's needed for validating the recall of an ANN-only index?
I added an edited note to the bottom of the blog post.
The original post and the experiments were created before pgvector 0.5.1 was out, and we had not realized there was significant work to optimize index creation time in the latest pgvector release.
We reran pgvector benchmarks with pgvector 0.5.1.
Now pgvector index creation is on par or 10% faster than lantern on a single core. Lantern still allows 30x faster index creation by leveraging additional cores.
> https://github.com/lanterndata/lantern/blob/040f24253e5a2651...
> Operator <-> can only be used inside of an index
Isn't the use of the distance operator in scan+sort critical for generating the expected/correct result that's needed for validating the recall of an ANN-only index?