Hacker News new | ask | show | jobs
by TremendousJudge 397 days ago
> But now ChatGPT does that in minutes

But it's trained on stackoverflow data? What happens in a few years when the data gets more and more outdated? Where will it get its knowledge then?

3 comments

They're learning from working code in GitHub, IDE "co-pilots"...
But a priori you don't know if the code you find on Github is "good", plus it doesn't come with a handy explanation. The quality of the data is much, much worse.
Fair point, but large, popular and well maintained/healthy repos would likely be better to learn from than SO. Lots of stack overflow convos have moved to GitHub issues as well.
It will steal our own data and we'll have a big "oopsie! didn't mean to!" moment 5-10 years after.
My point is that there won't even be any data to steal! The novel human-written and human-rated answers just won't exist anymore. Where will it get its answers on C++26 features from? Not the non-existing StackOverflow, that's for sure.
Ah in the training data sense, yeah that makes sense. My bet is that "code artisans" will see a revival in the 300k+ usd range that will drop into your codebase like a special forces team to unfuck the AI garbage all the prior "Seniors" implemented.
Why does any LLM need new information to do fundamentally the same thing?

And what makes the data outdated? New code? It can train on that. That, or there is simply nothing new to learn, just new ways to express the same thing.

> Why does any LLM need new information to do fundamentally the same thing?

What makes you think we will be doing fundamentally the same thing in the future? Language grow and change, systems change, operating systems change, hardware and specs change..

Nothing in computing is ever static.