Hacker News new | ask | show | jobs
by gwern 987 days ago
> Not all of them per se, take a look at something like Mistral. It's a 7B model displaying incredible performance.

I would, but they don't say what their dataset is that I can find anywhere, and the only thing they say about their instruction-tuned is that it's trained on 'publicly available' datasets. You know, the ones where a lot of them turn out under the hood to be drawing from the OA API or other pretrained models in some way or another...

> Especially not with pre-filtered/classified pre-training data.

Indeed not! But what exactly is prefiltering or classifying all that data...?