Hacker News new | ask | show | jobs
by ankit219 841 days ago
More than pretraining data, I think the advantage was ChatGPT and how quickly it grew. Remember it was 3.5, and within a month or two, it generated so many actual q&a pairs with rating, feedback, and production level data of how a model will be used by actual users. Those queries and subsequent RLHF + generating better answers for the questions meant the model would have been improved a lot at the SFT stage. Think this is the reason why Anthropic, Google, and Mistral, all three launched their own chatbots, all providing it to users for free and getting realtime q&a data for them to finetune the models on. Google did it with bard too, but it was so bad that not many used it.
1 comments

My understanding is that GPT-4 had been almost fully trained before ChatGPT was released - they spent around six months testing GPT-4 before making it available to the public, ChatGPT came out 31st November 2022, GPT-4 came out March 14th 2023.

But maybe that was still enough time for them to instruction tune it based on ChatGPT feedback, or at least to focus more of their fine tuning iteration in the areas they learned were strong or weak for 3.5 based on ChatGPT usage?

I don't think it was pretrained on knowledge gaps. A version was already available in testing w select customers. The version released to the public would definitely have feedback from those customers, and finetuned/instruction tuned on the data from ChatGPT.

Training data is publicly available internet (and accessible to everyone). It's the SFT step w high quality examples which determines how well a model is able to answer questions. ChatGPT's virality played a part in that in the sense that OAI got the real world examples + feedback others did not have. And yeah, it would have been logical to focus on 3.5's weaknesses too. From Karpathy's videos, it seems they hired a contractual labelling firm to generate q&a pairs.

Also, worth to remind that Bing Chat was launched in February 7 with GPT4 already.