Hacker News new | ask | show | jobs
by bangaladore 533 days ago
To some extent this is how many models are being produced today.

Basically its just a synthetic loop of using a previously developed SOTA (was) model like GPT-4 to train your model.

This can produce models with seemingly similar performance at a smaller size, but to some extent, less bits will be less good.