Hacker News new | ask | show | jobs
by JaggedJax 1018 days ago
I can't help but think LLM is the wrong tool for the job here. There are many address validation and standardization services, including databases you can get straight from USPS. Those services will give you real and consistent answers, rather than unknown edge cases that will shift subtly over time as your LLM changes.

Edit: The USPS even runs a program called CASS for this exact purpose. While you may not need to CASS certify yourself, you can either follow its rules or use a service that follows CASS to guarantee your results are accurate.

3 comments

This is a classic XY problem [1]. My _immediate_ reaction to seeing the dev attempt to compare US addresses was “where’s the USPS library?” Using an LLM prompt instead of a vetted library is just the wrong answer to solving the right problem.

[1]: https://xyproblem.info/

Indeed, and if you wanna self-host, libPostal can do a lot of the heavy lifting in normalisation of addresses.
It's a good point, but the challenge is we sometimes just get street1 from a utility without city/state/postal. We tried USPS and geocoding libraries, but they fail because they often pick a random-ish city which likely will not match.
I would say sometimes data needs to be rejected as invalid. I don't know the exact scenario here, but you'll never be able to know if a street number/name alone is unique as almost any street will have dozens or more matches across the country.

If people are jamming their entire address into address line 1, that is also solved by CASS.

How would a llm know the town any better then the other alternatives?