Hacker News new | ask | show | jobs
by sakisv 686 days ago
Ah yes.

My approach so far has been to first extract the brand names (which are also not written the same way for some fcking reason!), update the strings, and then compare the remaining.

If they have a high similarity (e.g. >95%) then they could be automatically merged, and then anything between 75%-95% can be reviewed manually.

1 comments

I am not by any mean an expert but maybe using some LLMs or a sentence transformer here could help to do the job?
I gave it a very quick try with chatgpt, but wasn't very impressed from the results.

Granted it was around January, and things may have progressed...

(Βut then again why take the easy approach when Ι can waste a few afternoons playing around with string comparisons)