|
|
|
|
|
by abathur
175 days ago
|
|
I think both of those experiments do a good job of demonstrating utility on a certain kind of task. But this is cherry-picking. In the grand scheme of the work we all collectively do, very few programming projects entail something even vaguely like generating an Nth HTML parser in a language that already has several wildly popular HTML parsers--or porting that parser into another language that has several wildly popular HTML parsers. Even fewer tasks come with a library of 9k+ tests to sharpen our solutions against. (Which itself wouldn't exist without experts trodding this ground thoroughly enough to accrue them.) The experiments are incredibly interesting and illuminating, but I feel like it's verging on gaslighting to frame them as proof of how useful the technology is when it's hard to imagine a more favorable situation. |
|
Granted, but this reads a bit like a headline from The Onion: "'Hard to imagine a more favourable situation than pressing nails into wood' said local man unimpressed with neighbour's new hammer".
I think it's a strong enough example to disprove "they're an interesting phenomenon that people have convinced themselves MUST BE USEFUL ... either through ignorance or a sense of desperation". Not enough to claim they are always useful in all situations or to all people, but I wasn't trying for that. You (or the person I was replying to) basically have to make the case that Simon Willison is ignorant about LLMs and programming, is desperate about something, or is deluding himself that the port worked when it actually didn't, to keep the original claim. And I don't think you can. He isn't hyping an AI startup, he has no profit motive to delude him. He isn't a non-technical business leader who can't code being baffled by buzzwords. He isn't new to LLMs and wowed by the first thing. He gave a conference talk showing that LLMs cannot draw pelicans on bicycles so he is able to admit their flaws and limitations.
> "But this is cherry-picking."
Is it? I can't use an example where they weren't useful or failed. It makes no sense to try and argue how many successes vs. failures, even if I had any way to know that; any number of people failing at plumbing a bathroom sink don't prove that plumbing is impossible or not useful. One success at plumbing a bathroom sink is enough to demonstrate that it is possible and useful - it doesn't need dozens of examples - even if the task is narrowly scoped and well-trodden. If a Tesla humanoid robot could plumb in a bathroom sink, it might not be good value for money, but it would be a useful task. If it could do it for $30 it might be good value for money as well even if it couldn't do any other tasks at all, right?