| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by uncircle 368 days ago

> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?

That's a very good question.

Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations, will "AI savvy" coders prefer old, boring languages and tech because there's more low-radiation training data from the pre-LLM era?

The most popular language/framework combination in early 2020s is JavaScript/React. It'll be the new COBOL, but you won't need an expensive consultant to maintain in the 2100s because LLMs can do it for you.

Corollary: to escape the AI craze, let's keep inventing new languages. Lisps with pervasive macro usage and custom DSLs will be safe until actual AGIs that can macroexpand better than you.

1 comments

NitpickLawyer 368 days ago

> Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations

I don't think the premise is accurate in this specific case.

First, if anything, training data for newer libs can only increase. Presumably code reaches github in a "at least it compiles" state. So you have lots of people fight the AIs and push code that at least compiles. You can then filter for the newer libs and train on that.

Second, pre-training is already mostly solved. The pudding seems to be now in post-training. And for coding a lot of post-training is done with RL / other unsupervised techniques. You get enough signals from using generate -> check loops that you can do that reliably.

The idea that "we're running out of data" is way too overblown IMO, especially considering the last ~6mo-1y advances we've seen so far. Keep in mind that the better your "generation" pipeline becomes, the better will later models be. And the current "agentic" loop based systems are getting pretty darn good.

link

bluefirebrand 368 days ago

> First, if anything, training data for newer libs can only increase.

How?

Presumably in the "every coder is using AI assistants" future, it will be an incredible amount of friction to get people to adopt languages that AI assistants don't know anything about

So how does the training data for a new language get made, if no programmers are using the language, because the AI tools that all programmers rely on aren't trained on the language?

The snake eating its own tail

link

NitpickLawyer 368 days ago

You can code today with new libs, you just need to tell the model what to use. Things like context7 work, or downloading docs, llms.txt or any other thing that will pop up in the future. The idea that LLMs can only generate what they were trained on is like 3 years old. They can do pretty neat things with stuff in context today.

link

bluefirebrand 368 days ago

The context would have to be massive in order to ingest an entire new programming language and associated design patterns, best practices and such wouldn't it?

I'm not an expert here by any means but I'm not seeing how this makes much sense versus just using languages that the LLM is already trained on

link

Leynos 368 days ago

Synthetic training data presumably.

link