|
|
|
|
|
by mattashii
812 days ago
|
|
> Considering that embedding vectors represent a lossy compression of the original text or images, is achieving a 100% recall necessary? For the app, maybe not. But as a database absolutist, I think you must be able to dump all rows of a table with WITH
limit_result AS (SELECT *, {similarity} AS metric FROM table ORDER BY {similarity} ASC LIMIT 10),
dist AS (SELECT MAX(metric) AS max_m FROM limit_result)
SELECT *, {similarity} AS metric FROM table, dist WHERE {similarity} > dist.max_m
UNION ALL
SELECT * FROM limit_result
... assuming that the ordered values are unique across the table and fully sortableA recall of <100% may skip some rows in the limit_result, which then also won't show up in the main table's scan result, thus potentially corrupting a data dump process that uses sorted output. |
|