| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by adventured 1018 days ago

50 lines of code that were never going to work with great accuracy.

Sure, it absolutely might be a win. It depends on just how much accuracy they needed in the checking system in question.

It's also worth noting that one could utilize both. The assumed fast, low cost 50 lines of code on your server that takes care of the easy 97%. And then throw GPT4 at the stray hard cases. It requires being able to correctly identify when your code isn't up to the task of course.

2 comments

turmacar 1018 days ago

Address matching isn't exactly a new problem. USPS provides an [API](https://developer.usps.com/api/18), and there are several python/ruby/any-other-language libraries/modules that would also just be a call instead of however many dozen lines of custom code you have to test.

Would be very interested in the longevity of this solution. It works today, but will it work in a month/year? A library file on the computer running the rest of the code isn't going to change.

link

specialp 1018 days ago

Great accuracy as tested to a continually changing black box. GPT hits are also expensive and often have unpredictable latency. This would have to be integration tested to detect changes to GPT answers.

link

adventured 1018 days ago

Correct me if I'm wrong, you can pick which dated GPT API to utilize and expect that to not act as a continually changing black box. I've been using the API for a long time and have been able to pick the version.

So for example: gpt-4-0314, or gpt-3.5-turbo-0613, etc.

The latency issue is definitely true. Ideally the cost could be limited to a very small percentage of hard cases (which you first have to identify).

link

eatonphil 1018 days ago

LLMs don't seem to be deterministic [0, 1, 2, 3]. So no, pinning the version wouldn't be enough.

[0] https://matt-rickard.com/foundational-models-are-not-enough

[1] https://arxiv.org/pdf/2308.02828.pdf

[2] https://www.sitation.com/non-determinism-in-ai-llm-output/

[3] https://towardsdatascience.com/the-magic-of-llms-prompt-engi...

link

adventured 1018 days ago

> So no, pinning the version wouldn't be enough.

You can to an extent dictate GPT's determinism with settings you can pass along in the API, combined with the parent already proclaiming they saw a 100% success rate.

So how do you know it wouldn't be enough? The parent is already saying their test suite indicates it is enough. What tests have you run counter to their claim to show it fails? And how do you know the parent can't increase the determinism even further beyond what they were already using in their testing (and decreasing the risk of negative outcomes by doing so)?

link