Hacker News new | ask | show | jobs
by hintymad 2 days ago
> There's so much work in delivering products that will carry your brand, and then must be supported.

People think otherwise with AI partly because Anthropic kept telling us that they didn't have to write code or review code any more for most of their work. Their agent swarms just comb through their github, slack and wikis to figure out what to do next, and another swarm of agents just review, test, merge, deploy, A/B test, and revert the code. Boris alone merged nearly 300 PRs in the past week (or two?). So the top research labs seem have broken the productivity seal.

And then they talk about this recursively self-improving AI that is so powerful, so autonomous that they advocate that every company should be prepared to "pause" the effort. And their Fable/Mythos has this specific restriction as mentioned in their model card[1] that they are going to reject requests to tune and train models because, well you guess it, the models are too powerful to be used by mere mortals.

[1] We’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).

6 comments

I think taking Anthropic or any company in this space at face value is naive at best though. AGI has been 6 months away for years now. Surely anyone can think this through: Anthropic knows what theyre doing with their public facing repositories, they know to make things enabled by their tech seem impressive. I would consider Bun etc. examples of this.

Realistically, nobody intellectually honest really knows.

The disagreement in timelines usually comes from differences in the used definition of AGI. Many who predict a 2026–2028 arrival define it as: 'the ability to perform any purely cognitive task a human can do.' If we stick strictly to that 'cognitive-only' metric, we are arguably very close.
No thats just not true.

Plenty of people say that by 2030 we have AGI, others estimate 10-20 years.

I personally say 5-15 years.

AI 2027 estimates for 2027: "OpenBrain automatescoding"

Genuine question - What part of ”that” is just not true?
" AGI has been 6 months away for years now."

No one said 2 years ago, that AGI is coming in 2026.

You're absolutely right — they said 2025.
I'm following plenty of people, i never heard 2025
Plenty of people say vaccines contain microchips. Research is not linear so this simple linear regression I see people doing regarding capability scaling makes no sense. Nobody can forecast a breakthrough, what makes you believe you can?
People can interpolate.

But lets be honest, I do not know, thats true.

I ahven't seen technology like this which affects me directly. But from past we know what disruptive technology looks and feels like. The weaving chair/loom disrupted an industry, energy, steam engine, internet.

Now we have the most generic technology an LLM/AI at a time were every major problem was solved:

We have the internet which allows for fast communication, we have very fast hardware, we have a supply chain which can react/act very fast globally and we have the richest companies on our planet investing into this technology unseen amounts of money. We have a local race between the richest companies on the world and a global race between the biggest world powers on our planet.

And we have the smartest people on our planet involved in this too. A lot of peple from academy went to the industry, some gave up their tenure for AI.

It would be very ignorant to assume that all of this can't lead to significant and fast change in our society.

It might not, but its like looking at a huge wildfire 50km away and just going to bed.

People think otherwise with AI partly because Anthropic kept telling us that they didn't have to write code or review code any more for most of their work.

Even if that were 100% true, it only collapses the coding effort to near zero. Anyone who's built and shipped a real product should know that coding is maybe 50% of the work, and on a mature product it can be much less.

I work on a 10+ year old codebase, there are some weeks I barely change any code.

Sometimes it takes hours of discussion and tracking down decision-makers just to figure out what the intended behavior is.

AI fans would say “this is what the spec is for”.
And even when you can easily find the intended behavior, you need to trace the modules that relies on the current behavior, especially if the change is located near the core of the software.
even boris says they need people with judgment to manage the agents

i dont write code by hand anymore but shipping something people want is as hard (or maybe harder?) as its ever been

Boris also says he stops using /plan, he writes loop to write prompt, and he simply asks AI to come up with solutions. He also said many times that his agents would comb their emails, slack channels, and Github issues to come up with things to do. When we combine what he has said, it's hard not to have the impression that he was implying full autonomy of their agents. The only that the engineers need to do is to build harness and to issue approvals, rejections, or suggestions.
Yes, there will always be some human bottleneck when it comes to abstract software

i also run loops that comb through slack / github to auto-propose a fix & have another agent auto-review, but you need the human to stamp, and they fail in subtle ways architecturally

Boris who?
Not OP, but I assume he means the Boris Cherny, the guy who made Claude code and is a micro-celebrity in certain developer circles now because of it.
I work on a toy project that has exactly one user (me). On its face it's fairly simple. It's a portal to my media server because I didn't like how Plex worked with regards to searching and filtering. I can look for movies or series by director, studio, publisher, etc. I can rate things, I can find highly rated things. It's great, and instead of bugging plex support to add new features, I just tell Deepseek to do it. I started it before LLms were prevalent and now that I have open code I've had Deepseek write and rewrite most of my code and implement new features.

But even with this toy project, and the target market being someone I should know very well (me), I often struggle to figure out what I want the app to do. When I go through brainstorming or grilling sessions it'll often ask me a question about how the product ought to work and I'm just like ¯\_(ツ)_/¯ give me suggestions and I'll let you know.

Genuine creativity is something LLMs struggle with and it kind of makes sense given their design. If you have a complete plan for a feature or even just an idea what the feature should do, that is enough for an LLM. But asking it to think and come up with a new feature idea by itself will always yield mostly extereme basic things you've already thought of. That creativity of "what" to build so it serves a purpose is still very difficult imo and LLMs are not good at it.
This is exactly why I have no interest in LLM created music, stories, movies, etc.
> People think otherwise with AI partly because Anthropic kept telling us that they didn't have to write code or review code any more for most of their work. Their agent swarms just comb through their github, slack and wikis to figure out what to do next, and another swarm of agents just review, test, merge, deploy, A/B test, and revert the code. Boris alone merged nearly 300 PRs in the past week (or two?).

Apart from many other issues with this, heavily subsidized subscription plans won't last forever, and if you start burning your own money on tokens in this way, you'll soon realize it's terribly inefficient.

> the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)

Holy crap that is dark. I like learning about ML for fun, and now I have to assume that their model is intentionally misinforming me to sabotage my learning? It is absolutely bananas that somebody decided that was ok behavior.

time to support open source and local models
I don’t see how that helps, unless you actually mean open source, rather than open weights like most people do. Without everything that goes into the model, including training data, these things are opaque.
Actual open source is hard without a big war chest that allows you to flagrantly steal the training data.
The raw training data is so large that very few parties could host it for free even if there weren't copyright barriers.

But I think you could have a full open source training software pipeline that's set up to work with Wikipedia, Common Crawl, Books3, Library Genesis, Anna's Archive, and whatever other useful data sets people can name. There would just be a step where you have to provide your own copy of Library Genesis (or whatever subset of it you have managed to obtain).

That may very well be the case. In fact, I'm nearly certain that you're right. But it doesn't change the fact that open weight models are altogether insufficient on a number of important dimensions regarding freedom and transparency. And so often (such as the comment I replied to, I think), even technical people seem to just ignore the difference. Open weights are just weights. No amount of open-washing changes that.
Honest question, I wonder why that is? Surely we have smart humans that did not read and learn "all the books". Can AI not be trained by re-reading material multiple times to reinforce?
Start up a seti at home style of open source LLM training! Assuming there is an ability to merge the sub models trained on each user's home PC into a larger model...
Someone could write a cyberpunk Three Body Problem with this premise.
They kinda did (though it's more inspired by Trusting Trust than AI)

https://corecursive.com/coding-machines-with-don-and-krystal...

TLDR :-)

This comment is not entirely on point with your comment, it circles around and above it looking for lift though.

If you're not doing work that requires your code to stay in home nation data centres, Claude for Deepseek, Deepclaude (https://github.com/aattaran/deepclaude) is a great way to get better at using Claude like tools for software development. It even does a pretty good job of putting together cover letters for job applications...

Using Deepclaude is very much cheaper than using claude... For hobby projects, I've found it useful. A recipe (for cooking) management app I've made took a couple of hours to put together and cost $US 0.5. Claude is far more expensive.

The downsides of Deepclaude for many are:-

- DeepSeek is a Chinese corporation so the Chinese Communist Party may ask for data if it wants it.

- DeepClaude isn't as fast as normal Claude, though it's still pretty fast and I think fast enough (YMMV).

- DeepClaude might not be as optimised for various code issues that Claude may be able to solve more quickly or effectively.

- The same safeguards are probably on DeepSeek, but you won't be "wasting" as much money as you might on using Claude.

Inference focused hardware (https://www.youtube.com/watch?v=nvPqHoVSenE, AI generated speech) may in the medium future cause a large enough cost/energy reduction for LLM tools like Claude to make local LLMs more attractive.

Inference focused hardware would make running Open Source models like DeepSeek on local machines far cheaper and control over safeguards would return to the end user.

Hopefully this leads to a localised LLM provision market where local businesses provide varieties of these "local" LLM services. Here, local could mean on premise through to state or nationally based LLM services. Eventually, government orgs outside of the US may demand this kind of LLM use, in the same way governments legally require data to be stored within national borders for many critical government functions.

A bloke can dream I guess...

...Could affordable inference focused hardware also cause the bottom to fall out of these stock market bending valuations for AI corps and their datacentre obsessions?... Not to mention the societal costs caused by the AI super corps building these data centres. At the moment, they're nearly making a profit... They seem almost like speculative companies... Is that a term?

I’ve been wondering if “you’re not google” when learning about googles software dev process applies to Anthropic. Anthropic is a company that A. Has cheap unlimited access to its models and B. Is probably largely insulated from the types of tradeoffs that the rest of industry has had to observe in the post-ZIRP era.

Like did they break through the productivity seal? Or are they willing to spend that much more on it since they see their failure as a like existential threat to humanity. I doubt it our boss sees your software the same way.

It doesn't need to be an existential threat to humanity - it's an existential threat to their business. They need agentic workflows to work for their business to become profitable. So pouring money into the "no engineers write code anymore, only agents" model is at once R&D, QA, product development, and advertising. They can spend as much of their investors' money on this as they have to because if they can't (sustainably) sell this vision to other companies, their company collapses.
What is post-ZIRP please :-) ?
Zero interest rate policy. When interest rates are Near zero you can spend money like it’s free. A lot of what we thought of as like normal engineering culture were the result of interest rates being zero.
Tah for that.
Why try to disrupt software though?

Isn't this the classic "dev wants to do start-up, has no skills ouside dev, do builds a dev tool" trap?

I don’t think they chose software, software chose them. Its the only real entry point they have to make serious money.
Anthropic is full of shit.