Hacker News new | ask | show | jobs
by VoVAllen 535 days ago
Hi, I'm the author of the article. Meta have conducted some experiments on dynamic IVF with datasets of several hundred million records. The conclusion was that recall can be maintained through simple repartitioning and rebalancing strategies. You can find more details here: DEDRIFT: Robust Similarity Search under Content Drift https://arxiv.org/pdf/2308.02752. Additionally, with the help of GPUs, KMeans can be computed quickly, making the cost of rebuilding the entire index acceptable in many cases.