Hacker News new | ask | show | jobs
by Glohrischi 30 days ago
It was estimated that Mythos is 10T.

And serving is not training. For distilling you need to train the big models to have something to be distilled.