Hacker News new | ask | show | jobs
by osmarks 727 days ago
Mistral and Meta release "instruct" (RLHF) and not-instruct models. The non-instruct ones are in fact non-RLHF, pretraining-only ones (though they probably have ChatGPT-ish text in the dataset nowadays, and Meta might have done some extra training on evals...).