|
|
|
|
|
by lqhl
813 days ago
|
|
MyScaleDB utilizes approximate nearest neighbors (ANN) algorithms such as ScaNN, HNSW, and IVF. As a result, it may not achieve a 100% recall rate. However, depending on the search parameters used, it can attain recall rates of up to 95% or even 99%. Considering that embedding vectors represent a lossy compression of the original text or images, is achieving a 100% recall necessary? I am interested in understanding its practical implications. Disclaimer: I am an employee at MyScale. |
|
For the app, maybe not. But as a database absolutist, I think you must be able to dump all rows of a table with
... assuming that the ordered values are unique across the table and fully sortableA recall of <100% may skip some rows in the limit_result, which then also won't show up in the main table's scan result, thus potentially corrupting a data dump process that uses sorted output.