| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by michaelrpeskin 361 days ago

Yes, this!

A couple of week ago, I had a little down time and thought about a new algorithm I wanted to implement. In my head it seemed simple enough that 1) I thought the solution was already known, and 2) it would be fairly easy to write. So I asked Claude to "write me a python function that does Foo". I spent a whole morning going back and forth getting crap and nothing at all like what I wanted.

I don't know what inspired me, but I just started to pretend that I was talking to one one of my junior engineers. I first asked for a much simpler function that was on the way to what I wanted (well, technically, it was the mathematical inverse of what I wanted), then I asked it to modify it to add one different transform, and then another, and then another. And then finally, once the function was doing what I wanted, I asked it to write me the inverse function. And it got it right.

What was cool about it, is that it turned out to be more complex linear algebra and edge cases than I originally thought, and it would have been weeks for me to figure all of that out. But using it as a research tool and junior engineer in one was the key.

I think if we go down the "vibe coding" route, we will end up with hoards of juniors who don't understand anything and the stuff they produce with AI will be garbage and brittle. But using AI as a tool is starting to feel more compelling to me.

2 comments

ifwinterco 361 days ago

The LLM will never admit it doesn't have a clue what's going on, but over time you develop a sense of when it's onto something and when it's trapped in a loop of plausible sounding nonsense

Edit: Also, it's funny how often you can get it to improve its output by just saying "this looks kind of bad for x reason, there must be a way to make it better"

bredren 361 days ago

I have experimented with instructing CC to doubt itself greatly and presume it is not validating anything properly.

It caused it to throw out good ideas for validation and working code.

I want to believe there is some sweet spot.

The constant “Aha!” type responses followed by self validating prose that the answer is at hand or within reach can be intoxicating and can not be trusted.

The product is also seemingly in constant flux of tuning, where some sessions result in great progress, others the AI seems as if it is deliberately trying to steer you into traffic.

Anthropic is alluded toward this being the result of load. They mentioned in their memo about new limits for Max users that abuse of the subscription levels resulted in ~subpar product experiences. It’s possible they meant response times and the overloaded 500 responses or lower than normal TPS, but there are many anecdotal accounts of CC suddenly having a bad day from “longtime” users, including myself.

I don’t understand how load would impact the actual model’s performance.

It seems like only load based impacts on individual session context would result in degraded outputs. But I know nothing of serving LLM at scale.

Can anyone explain how high load might result in an unchanged product performing objectively worse?

rusk 361 days ago

“ we will end up with hoards of juniors who don't understand anything and the stuff they produce”

I’ve spent a lot of my career cleaning up this kind of nonsense so the future looks bright