|
|
|
|
|
by ModernMech
433 days ago
|
|
I'd like to second the point made to you in this thread that went without reply: https://news.ycombinator.com/item?id=43702895 It's true that we use tools with uncertainty all the time, in many domains. But crucially that uncertainty is carefully modeled and accounted for. For example, robots use sensors to make sense of the world around them. These sensors are not 100% accurate, and therefore if the robots rely on these sensors to be correct, they will fail. So roboticists characterize and calibrate sensors. They attempt to understand how and why they fail, and under what conditions. Then they attempt to cover blind spots by using orthogonal sensing methods. Then they fuse these desperate data into a single belief of the robot's state, which include an estimate of its posterior uncertainty. Accounting for this uncertainty in this way is what keeps planes in the sky, boats afloat, and driverless cars on course. With LLMs It seems like we are happy to just throw out all this uncertainty modeling and to leave it up to chance. To draw an analogy to robotics, what we should be doing is taking the output from many LLMs, characterizing how wrong they are, and fusing them into a final result, which is provided to the user with a level of confidence attached. Now that is something I can use in an engineering pipeline. That is something that can be used as a foundation to something bigger. |
|
Yeah, I was getting a little self-conscious about replying to everyone and repeating myself a lot. It felt like too much noise.
But my first objection here is to repeat myself- none of my examples are sensitive to this problem. I don't need to understand what conditions cause the calculator/IDE/medical test/LLM to fail in order to benefit from a 95% success rate.
If I write a piece of code, I try to understand what it does and how it impacts the rest of the app with high confidence. I'm still going to run the unit test suite even if it has low coverage, and even if I have no idea what the tests actually measure. My confidence in my changes will go up if the tests pass.
This is one use of LLMs for me. I can refactor a piece of code and then send ChatGPT the before and after and ask "Do these do the same thing". I'm already highly confident that they do, but a yes from the AI means I can be more confident. If I get a no, I can read its explanation and agree or disagree. I'm sure it can get this wrong (though it hasn't after n~=100), but that's no reason to abandon this near-instantaneous, mostly accurate double-check. Nor would I give up on unit testing because somebody wrote a test of implementation details that failed after a trivial refactor.
I agree totally that having a good model of LLM uncertainty would make them orders of magnitude better (as would, obviously, removing the uncertainty altogether). And I wouldn't put them in a pipeline or behind a support desk. But I can and do use them for great benefit every day, and I have no idea why I should prefer to throw away the useful thing I have because it's imperfect.