Hacker News new | ask | show | jobs
by queenkjuul 124 days ago
For me it's just wildly unpredictable. Sometimes it gets a small task perfectly right in one shot, sometimes it invents an absurd new way to be completely wrong.

Anyone trusting it to just "do its own thing" is out if their mind

2 comments

For me I would ask it to do a simple thing and it would give me the tutorial code you could find anywhere on the Internet. Then you ask it to modify it in a way that you can't find in any example online, it will tell you it's fixed everything, but actually nothing has changed at all or it's completely broken.

I think if someone's goal was just the tutorial code, it would have been very impressive to them the AI can summon it.

It only takes a cursory knowledge of what LLMs really are to understand why recreating tutorials is easy, but making actual new stuff that is well engineered (takes way way more than "passes the test suite") is difficult.

Actual novel stuff is so far out on the long tail of iterations that it's a gamble: it might pop up in an early run, or might take 2000 prompts and $20,000 worth of tokens. And it's still not really engineered, it's 10,000 monkeys with typewriters copying random shakespeare snippets off the chalkboard. At some point you'll get all of Hamlet, but most of the time you'll get garbage, and sometimes you'll get Romeo & The Taming of The Tempest.

this is what I've been using freebie gemini chat for mostly, example code, like reminding me of c stdlib stuff, javascript, a bit of web server stuff here and there. I think it would be fun to give googles agent or cli stuff a spin but when I read up here and there about antigravity, I'm reading that people are getting their accounts shutdown for stuff I would have thought was ok, even if they paid for it (well actually as usual the actual reasons for accounts getting zapped remain unknown as is today's trend for cloud accounts).

I'm too poor for local llms, I think there might be a 2 or 4gb graphics card in one of my junk pcs but thats about it lol

I found that unpredictability to be interesting. I'm doing super simple projects with these models and a year, or even six months ago, it would give me a block of code and as soon as you ran it, it would fail. And you'd have to paste the error in and keep going until it was smoothed out.

The other day though I asked for something simple and it one-shotted the problem. To me, that's new.

I know this success was a statistical outlier, however. I grok how to use it and to not trust it. I'm just shocked so many people smart people fail to understand it.