Hacker News new | ask | show | jobs
by shagie 1174 days ago
There's a "thumbs up" and "thumbs down" next to each generated response.

While this model may not be getting updated in real time, I would be surprised if that feedback isn't used when looking at updating the model with good feedback being used for retraining the model.

1 comments

I would love to hear opinions from LLM experts. But I have a feeling they may not be able to update the model, at a fundamental level, so easily by live feedback.

Based on what I have been reading, training the core model is a work of statistical analysis on how often words follow each other. This produces a graph tree of words.

Now how do you re-evaluate the entire graph based on a single additional feedback without actually retraining the entire model with the said feedback (because it is costly, and result wouldn't be immediate, unless you're thinking of injecting the input as an initial condition to the readily trained model)

It is possible to tune the model, and I suspect that remains something that is done on a regular basis.

https://platform.openai.com/docs/guides/fine-tuning

In particular from https://platform.openai.com/docs/models/gpt-3

> With the release of gpt-3.5-turbo, some of our models are now being continually updated. In order to mitigate the chance of model changes affecting our users in an unexpected way, we also offer model versions that will stay static for 3 month periods. With the new cadence of model updates, we are also giving people the ability to contribute evals to help us improve the model for different use cases. If you are interested, check out the OpenAI Evals repository.

The feedback wouldn't be immediately injected back into the model (human curation of the responses is needed to see if the feedback is appropriate).

Some of the feedback would be used to train the moderation / supervisor model. https://platform.openai.com/docs/models/moderation