| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gpa 1751 days ago
	The problem - correct string matching at scale. I am aware of fuzzy string matching. The problem is that the two strings can be > 90% similar even if the difference is, for example, one digit in the year of manufacturing. My current solution is to represent the 2 strings as similar as I can based on the available information by transforming (wrangling) the data to match the data as close as possible and then applying constraints based on make, model and year (they should be the same). It works pretty well, but I am looking for a more interactive (human-in-the-loop) solution.

1 comments

wizzwizz4 1751 days ago

I'd just slap a GUI / audit logs on top. Show the intermediate data (the “wrangling”), show the computed similarities, show the conclusion (this met that threshold, and the other was equal, so it's category seven).

link

gpa 1751 days ago

Can you elaborate on the technical details: which language, library or framework would you use?

link

wizzwizz4 1750 days ago

Tkinter, probably. Or a web interface. Depends on what I'm doing, honestly – the answer will always be “whatever's currently being used”.

link