Hacker News new | ask | show | jobs
by PeterisP 856 days ago
Probably not right now, the standard process would involve re-running a 'finetuning' part after any update to the underlying model, and while that's far less expensive than the main training, it's probably not something you'd want to do every day.
2 comments

I'm asking you this since you sound like you might know. At what point in the process do they add in the guardrails/"baby-proofing"? And how do they do it?
There's usually a two or three step training procedure, first training to predict the next word on a huge corpus of text (billions or trillions of words), then possibly some instruction tuning (giving the model question & answer pairs and training on the answer) and then finally RLHF (or RLAIF, DPO etc) where the model is trained to match human preferences. It's this last step that is used to increase the helpfulness & harmlessness of the model, training it to not respond to certain topics.
In general, the core language model is simply trained on a very large amount of unannotated text (which is the most time-consuming and expensive part), but a language model is not directly very useful in the role of e.g. a chat agent, it quite literally tries to continue text and that sometimes is what you want and sometimes isn't.

The second step is fine-tuning the model on a much smaller set of annotated data which specify that it should actually "do something" in its responses and what it should do; it "teaches" it that it should actually answer the questions instead of e.g. continuing on with a list of more questions in the same vein, and most such training sets also "teach" it that for certain questions the appropriate response is a refusal.

If you have the original core model (before that instruction tuning) then you can repeat the same process but instead replace the instruction training set with a different one, so you can "instruct" the model to behave differently. Here is a nice and informative article from Eric Hartford about how he did that to make certain 'uncensored' models - https://erichartford.com/uncensored-models

I’m curious, how come Gemini doesn’t have these knowledge cut-offs?
Why do you assume that Gemini doesn't have these knowledge cut-offs?

If I ask Gemini, for example "Give me three important events that happened during May 13, year 2023.", then it says that this is "in the future" and responds that it can provide some guesses "Based on publicly available information from early February 2023", so that probably is the cut-off date for the model.

However, I would assume that (just like Bing) for certain questions it can pull in extra information from web searches - the model can be old, but if the system puts some retrieved document(s) in the prompt context so that the model can use them for generating the response, it can use that (limited) fresh information as well.

I might have been a stupid assumtion indeed but I assumed it based on the fact that Gemini does not respond with the same shit chatGPT says regarding its knowledge cut-off.