Hacker News new | ask | show | jobs
by MBCook 1018 days ago
So you replaced 50 lines of code with a service call to a service that burns massive amounts of electricity/cooling capacity, certainly runs slower, and adds a service dependency that could break on a whim without your knowledge?

And that’s a win?

3 comments

50 lines of code that were never going to work with great accuracy.

Sure, it absolutely might be a win. It depends on just how much accuracy they needed in the checking system in question.

It's also worth noting that one could utilize both. The assumed fast, low cost 50 lines of code on your server that takes care of the easy 97%. And then throw GPT4 at the stray hard cases. It requires being able to correctly identify when your code isn't up to the task of course.

Address matching isn't exactly a new problem. USPS provides an [API](https://developer.usps.com/api/18), and there are several python/ruby/any-other-language libraries/modules that would also just be a call instead of however many dozen lines of custom code you have to test.

Would be very interested in the longevity of this solution. It works today, but will it work in a month/year? A library file on the computer running the rest of the code isn't going to change.

Great accuracy as tested to a continually changing black box. GPT hits are also expensive and often have unpredictable latency. This would have to be integration tested to detect changes to GPT answers.
Correct me if I'm wrong, you can pick which dated GPT API to utilize and expect that to not act as a continually changing black box. I've been using the API for a long time and have been able to pick the version.

So for example: gpt-4-0314, or gpt-3.5-turbo-0613, etc.

The latency issue is definitely true. Ideally the cost could be limited to a very small percentage of hard cases (which you first have to identify).

> So no, pinning the version wouldn't be enough.

You can to an extent dictate GPT's determinism with settings you can pass along in the API, combined with the parent already proclaiming they saw a 100% success rate.

So how do you know it wouldn't be enough? The parent is already saying their test suite indicates it is enough. What tests have you run counter to their claim to show it fails? And how do you know the parent can't increase the determinism even further beyond what they were already using in their testing (and decreasing the risk of negative outcomes by doing so)?

You could also cache addresses seen before.
But isn't this somewhat true of many cloud hosted api calls we already make heavy use of day to day?

I think this is a cute use case. I've recently outsourced categorizing the titles of user created tutorials into groups by relative similarity, to great effect. Took a few minutes.

It's definitely a win in my book.