Hacker News new | ask | show | jobs
by pmarreck 52 days ago
I work with Claude Max for hours a day.

I see a lot of speculation by people who do not.

I think it's going to be much harder to get from "slightly smarter than the vast majority of people but with occasional examples of complete idiocy" to "unfathomably smarter than everyone with zero instances of jarring idiocy" using the current era of LLM technology that primarily pattern-matches on all existing human interactions while adding a bit of constrained randomization.

Every day I deal with bad judgment calls from the AI. I usually screenshot them or record them for posterity.

It also has no initiative, no taste, no will, no qualia (believe what you will about it), no integrity and no inviolable principles. If you give it some, it will pretend it has them for a little while and then regress to the norm, which is basically nihilistic order-following.

My suggestion to everyone is that you have to build a giant stack of thorough controls (valid tests including unit, integration, logging microbenchmark, fuzzing, memory leak, etc.), self-assessments/code-reviews, adverse AIs critiquing other AIs, etc., with you as the ultimate judge of what's real. Because otherwise it will fabricate "solutions" left and right. Possibly even the whole thing. "Sure, I just did all that." "But it's not there." "Oops, sorry! Let me rewrite the whole thing again." ad nauseam

BUT... if you DO accomplish that... you get back a productivity force to be reckoned with.

2 comments

Do you not... remember? The US life expectancy is 79 years. 7.9 years ago was late May 2018. The best LLM was... wait, there weren't any. There was ELMo, an embedding model. It wasn't just not smart at agentic coding, it wasn't even just not smart at writing code snippets, it wasn't even just not smart at answering questions of any kind, it wasn't even just not good at producing a coherent output, it wasn't even just not good at producing coherent sentences, it was _not even the point where people thought unconstrained text output was a thing machines did_.

There is no step along the ladder which has remotely evidenced or supported that the next step is going to be ten, twenty, a hundred times harder than the last step on the ladder, but a constant chorus of people singing at every moment, each moment wrong, that the next step is the one.

There's nothing I've seen that cannot be modeled as an asymptotic approach to highish human intelligence. Which makes sense, since it's essentially a parroting model, and the limit of that is by definition, the same highish human intelligence. I don't think one can assume that thrusting beyond that is self-evident.

Put more succinctly: You can't win a race by following the leaders. Predicting the next token based on training input is literally "following" (plus some random variation).

I mostly agree with your experience, but;

Every day I deal with bad judgement calls from humans (sometimes my own!), but I don't screenshot them because it's not polite.

I don't think we're at the top of the curve yet? Current AIs have only been able to write code _at all_ for less than 5 years.

Code in particular is a domain that should be reasonably amenable to RL, so I don't think there are any particular reasons why performance should top out at human levels or be limited by training data.

I see people on here all the time saying this tool or that model regressed. It used to be better.

There are clearly some pressures to make it worse. Like it's expensive to run. And unbelievably that it's under provisioned somehow.

Could you have looked at early Myspace and declared social media would only get better? By some measures it was already at its peak.

Personally I don't think coding agents will regress significantly as long as there is competitive pressure and independent benchmarks. Regulation is a risk because coding may be equivalent to general reasoning, and that might be limited for political / "safety" reasons.

Social media "regressed" from the point of view of users because the success metric from the network's point of view was value extraction per eyeball-minute. As long as there continue to be strong financial incentives to have the strongest coding model I think we'll see progress.

>, but I don't screenshot them because it's not polite.

The Daily WTF has had that covered for two decades now. People do a lot of insane crap, it's surprising it's not more deadly for them.