| HN Mirror

When ChatGPT-3 came out my first take was that we had no idea how those things worked, but once the research happened, people were going to learn to reduce the cost of the models dramatically and that's what happened.

The approach where "the model has a huge amount of world knowledge and can speak off the cuff about any subject" would inevitably fall to "it can use a search engine, read the results, and then reason based on the content of the results", not least because people are always going to want to talk about, say, the Bills vs Chiefs game that happened last night and we can't retrain the whole model every day.

Myself I like to train models that are so simple that I can train 20 in 3 minutes and take the best. I can quickly do experiments to understand how the models. If I had a model that took 30 minutes to train and had a reliable training process I could use that for my purpose, but as a side project I don't have the time to do large numbers of experiments to get there if my model takes that long to train.

So for all those reasons it seems we should have a 'minimal' model that is good at reasoning. The faster the model is to train and do inference on the faster people can do experiments and make it better.

The outcry people have that AI models need to steal increasing amounts of content for training might diminish (I hope so!) but people might have the realization that, if you want to research a topic, you could send a webcrawler off to look at 1 million documents and have an AI process them for you (I'm living that dream!)