Hacker News new | ask | show | jobs
by mmiliauskas 1095 days ago
First we had outsourcing to Indians, now we have ChatGPT. There is almost a rule of thumb, the less you pay, the bigger pile of shit you get. At least with ChatGPT you can vet it first, but with the market being flush with 1-2 year experience devs, globally vetting will be shit too. I honestly wonder, what will start happening to all these LLMs when the training set will get over-represented with cheap, fast, crappy code written by LLMs themselves. I bet "content inbreeding" will become the topic in the future.
5 comments

>when the training set will get over-represented with cheap, fast, crappy code written by LLMs themselves

It's already happening. An MIT study came out last week that found that Amazon Mechanical Turk workers hired to do RLHF type training of models were using ChatGPT to select the best answer. And the web being polluted by AI generated content which then gets scraped into Common Crawl and other training data sets has been an issue for a couple of years now.

It’s just a new tool. It isn’t outsourcing. It sounds like the person you’re replying to uses it the same way I do which is basically as a way to quickly brainstorm solutions. It acts as a rubber duck that forces you to explain the problem clearly but it has the added benefit of suggesting mostly correct code. It’s a bit like pair programming where you’re telling it the high level things to do and it’s hammering out the boilerplate while you also review in real time and point out mistakes as they happen.

I think you’re completely wrong about new devs producing worse code with this new tool. On the contrary, they’re going to be able to learn things it took you years to master in a matter of months since they now have a private tutor/mentor/reviewer/domain expert/consultant on call 24/7 for $20 a month.

As for using generated content as input, Microsoft has published a few papers showing that using curated generated content can be used to train specialized models that are more competent in their domain than the original models and the kicker is they didn’t even use humans to curate the content, they just used existing language models!

IMO if you look at how much better GPT-4 is at coding compared to GPT-3.5, and advances in letting GPT test and debug it’s own code, it’s not going to be “cheaper and worse” in the future.

GPT and LLMs will allow good and seasoned programmers to produce better code in time-constrained environments.

At a quick glance the sentiment that the market being flush with newer developers would somehow automatically lead to a dip in technical advancements in the near future seems completely made up. I'd believe it to be the opposite, frankly. I'd see a dip in the near future if a large portion of the more senior engineers died all of a sudden and took their knowledge with them. That's not going to happen.
Have you seen the JavaScript world?
Or Python world for that matter?

It all seems to be correlated with growth - as in, the fastest-growing areas of the industry, ones that are hot and considered a career gateway, attracting the largest amount of outsiders - are the biggest circuses on wheels.

It might be that the moat for large LLM providers will be their ability to pay good developers to write good code solely for the purpose of feeding the training corpus of the Coding-LLM-as-a-Service.