Hacker News new | ask | show | jobs
by hoseja 1195 days ago
How do they keep churning these out this fast? Feels like this kind of technology should take longer to develop, if only through the baby-with-nine-mums-in-one-month adage.
3 comments

Funding and public interest.

LLMs have been around for a while and they aren't really that different than they were a few years ago tech-wise. The question was always about being able to get good data and compute power for training/running them.

Now that people understand the capabilities of the tech, it's got potential for profit and there's incentive to throw money at it.

OpenAI is treating GPT as a "foundational model". They spend time training the foundational model, then build on top of that. GPT was published may 2020. GPT 3.5 ("text-davinci-003" and "code-davinci-002") shipped a year ago, and ChatGPT was just a fine tuned on top of those.

So they've had plenty of time to increase the training set, improve the architecture and run GPUs full power to get a GPT-4.

GPT-3 came out almost 3 years ago. If anything this has been too slow compared to previous ones.