|
|
|
|
|
by jcgrillo
628 days ago
|
|
> there's a class of problems which requires knowing what happens "under the hood" at a lower level, and many shops, especially, say, in webdev, don't have the luxury of having engineers which know all ins and outs of the entire system. I think this passive framing of the problem--that this is some "luxury"--papers over something important, which bears repeating: If you advertise and provide some service, you own its production behavior including uptime, correctness, and performance. Failure to maintain these is really bad and if negligence contributes to these failures it's malpractice. Negligence includes failing to maintain and train staff properly. > What would happen if there was no one to investigate and find the root cause of the bug? I don't see this as a valid excuse, ever. To end up in such a situation is a catastrophic engineering disaster. |
|
>To end up in such a situation is a catastrophic engineering disaster.
That was a novel bug in the PHP runtime which manifested only in very specific PHP configurations and under a very specific load. Do you recommend hiring a PHP runtime expert just in case it repeats again? Earlier this year we also ran into a rare Linux kernel bug. Do we need to hire a Linux kernel expert, just in case? Or teach PHP programmers how to debug kernel drivers? This kind of "never seen before" stuff happens quite often under high load (even though we do load testing).
What really matters, I think, is how the entire delivery process/pipeline is designed: whether we have tests, QA, monitoring, if it's easy to revert a bad release, if we have on call engineers, tech support, backups, replicas etc. It's not realistic to have experts for every possible problem in the stack, and it's not possible to always have bug-free software; what's more important is if our engineering practices allow us to quickly recover from problems which were never seen before. And in my analogy, if we have an LLM which suddenly produces unstable code (although it passed all QA checks during testing) and no one immediately knows how to fix it, it's no different from running into a kernel, runtime or hardware bug, where the chance of anyone immediately knowing how to fix the root cause is close to zero, too. You already must have processes in place which allow you to recover from such unexpected breaking bugs quickly, with LLMs or without. Sure if the LLM crashes your production server every single day, then it's not a very useful LLM. I hope future coding LLMs will continue to improve.