Hacker News new | ask | show | jobs
by nosecreek 686 days ago
Absolutely! It’s made it difficult to implement some of the cross-retailer comparison features I would like to add. For my charts I’ve just manually selected some products, but I’ve also been trying to get a “good enough but not perfect” string comparison algorithm working.
2 comments

would maintaining a map of products product_x[supermarket] with 2-3 values work? I don't suspect that supermarkets would be very keen to change the name (but they might play other dirty games)

I am thinking of doing the same thing for linux packages in debian and fedora

Ah yes.

My approach so far has been to first extract the brand names (which are also not written the same way for some fcking reason!), update the strings, and then compare the remaining.

If they have a high similarity (e.g. >95%) then they could be automatically merged, and then anything between 75%-95% can be reviewed manually.

I am not by any mean an expert but maybe using some LLMs or a sentence transformer here could help to do the job?
I gave it a very quick try with chatgpt, but wasn't very impressed from the results.

Granted it was around January, and things may have progressed...

(Βut then again why take the easy approach when Ι can waste a few afternoons playing around with string comparisons)