| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kgeist 628 days ago
	LLMs can be made more deterministic if you decrease the temperature parameter and have a fixed seed. Outputs can be controlled with test suites (i.e. that they do not change behavior or have performance regressions). For me as a team lead, a human programmer is already a very non-deterministic agent :) Give a non-trivial task to 10 human programmers and they will all solve it differently. Lack of debuggability is a good argument. Maybe it's only a problem if you want a human to debug the generated code? How about let an LLM iteratively run the code and figure out where it goes wrong by itself (o1 style).

1 comments

jcgrillo 628 days ago

> Maybe it's only a problem if you want a human to debug the generated code?

What will you tell your customers when you're suffering some performance regression and e.g. your kafka lag is growing without bound? "I'm sorry, the LLM seems to be unable to figure out how to fix the latest performance regression"? You can't just absolve yourself of responsibility like that. You, the human, are responsible for every single thing the computer does in production, and if you absolve yourself of ownership by leaning on an LLM you end up risking catastrophic helplessness. So you'd better be confident the LLM can debug every issue that will ever come up otherwise your decision to use the LLM could come back at you really hard.

link

kgeist 628 days ago

>You can't just absolve yourself of responsibility like that. You, the human, are responsible for every single thing the computer does in production, and if you absolve yourself of ownership by leaning on an LLM you end up risking catastrophic helplessness

>So you'd better be confident the LLM can debug every issue that will ever come up otherwise your decision to use the LLM could come back at you really hard.

Most of our programmers are PHP devs. They don't know any C. Once, we hit a bug in the PHP runtime which sporadically crashed our entire application. None of the PHP devs were able to fix the bug because they had no experience debugging C code, let alone the PHP runtime specifically. Fortunately, I had experience with C so I was able to research PHP's source code, and trace the crashes to a memory corruption bug in PHP which only surfaced when a very specific set of options was enabled and only under a high production load (so we did not see it during testing). We reverted the changes and the bug disappeared.

What would happen if there was no one to investigate and find the root cause of the bug? Without knowing the cause, they'd probably first try to revert the changes ASAP and that would already solve the problem for the customers. The situation is pretty similar to what you're describing: there's a class of problems which requires knowing what happens "under the hood" at a lower level, and many shops, especially, say, in webdev, don't have the luxury of having engineers which know all ins and outs of the entire system. So this situation can happen any time without any LLMs involved: hardware failures, a kernel bug, a runtime bug -- they all can catch you unprepared.

My point is, the risk is definitely there ("I have no idea what's happening and how to fix it") but it's not something novel and can happen without LLMs, and people usually find workarounds. As for debuggability, although LLMs can produce pretty bad code that is harder to debug, I think it's still debuggable by a human, in case of a rare event when even a sufficiently smart LLM cannot debug the problem. The code which, say, ChatGPT generates, is pretty readable and understandable.

link

jcgrillo 628 days ago

> there's a class of problems which requires knowing what happens "under the hood" at a lower level, and many shops, especially, say, in webdev, don't have the luxury of having engineers which know all ins and outs of the entire system.

I think this passive framing of the problem--that this is some "luxury"--papers over something important, which bears repeating:

If you advertise and provide some service, you own its production behavior including uptime, correctness, and performance. Failure to maintain these is really bad and if negligence contributes to these failures it's malpractice. Negligence includes failing to maintain and train staff properly.

> What would happen if there was no one to investigate and find the root cause of the bug?

I don't see this as a valid excuse, ever. To end up in such a situation is a catastrophic engineering disaster.

link

kgeist 628 days ago

>Negligence includes failing to maintain and train staff properly.

>To end up in such a situation is a catastrophic engineering disaster.

That was a novel bug in the PHP runtime which manifested only in very specific PHP configurations and under a very specific load. Do you recommend hiring a PHP runtime expert just in case it repeats again? Earlier this year we also ran into a rare Linux kernel bug. Do we need to hire a Linux kernel expert, just in case? Or teach PHP programmers how to debug kernel drivers? This kind of "never seen before" stuff happens quite often under high load (even though we do load testing).

What really matters, I think, is how the entire delivery process/pipeline is designed: whether we have tests, QA, monitoring, if it's easy to revert a bad release, if we have on call engineers, tech support, backups, replicas etc. It's not realistic to have experts for every possible problem in the stack, and it's not possible to always have bug-free software; what's more important is if our engineering practices allow us to quickly recover from problems which were never seen before. And in my analogy, if we have an LLM which suddenly produces unstable code (although it passed all QA checks during testing) and no one immediately knows how to fix it, it's no different from running into a kernel, runtime or hardware bug, where the chance of anyone immediately knowing how to fix the root cause is close to zero, too. You already must have processes in place which allow you to recover from such unexpected breaking bugs quickly, with LLMs or without. Sure if the LLM crashes your production server every single day, then it's not a very useful LLM. I hope future coding LLMs will continue to improve.

link

jcgrillo 628 days ago

> What really matters, I think, is how the entire delivery process/pipeline is designed: whether we have tests, QA, monitoring, if it's easy to revert a bad release, if we have on call engineers, tech support, backups, replicas etc.

Yeah I agree with this. Mitigating some production issue should not require a deep dive engineering effort, it should be routine. I just worry that LLM-assistance seems like it's going to turbocharge technical debt accrual and that freaks me out--the prospect of defaulting on technical debt is nightmare fuel.

link