|
|
|
|
|
by throw101010
38 days ago
|
|
LLMs do deliver "miracles", in certain cases, if you've experienced it and have been blown away by their output (one shot functional app from a well manufactured prompt, new feature added flawlessly on a complicated existing codebase, etc.), it can be tempting to reajust your expectations and think this will work consistently and at a much larger scale. They can assimilate 100s of thousands of tokens of context in few seconds/minutes and do exceptional pattern matching beyond what any human can do, that's a main factor in why it looks like "miracles" to us. When a model actually solves a long standing issue that was never addressed due to a lack of funding/time/knowledge, it does feel miraculous and when you are exposed to this a couple of times it's easy to give them more trust, just like you would trust someone who provided you a helping hand a couple of times more than at total stranger. |
|
I suppose it's difficult to account for the inconsistency of something able to perform up to standard (and fast!) at one time, but then lose the plot in subtle or not-so-subtle ways the next.
We're wired to see and treat this machine as a human and therefore are tempted to trust it as if it were a human who demonstrated proficiency. Then we're surprised when the machine fails to behave like one.
I have to say, I'm still flabbergasted by the willingness to check out completely and not even keep on top of, and a mental model of, what gets produced. But the mind is easily tempted into laziness, I presume, especially when the fun part of thinking gets outsourced, and only the less fun work of checking is left. At least that's what makes the difference for me between coding and reviewing. One is considerably more interesting than the other, much less similar than they should be, given that they both should require gaining a similar understanding of the code.