We used something similar to build a “similar articles” feature & it gave us de-duplication essentially for free.