|
|
|
|
|
by napier
1119 days ago
|
|
I’d like to see a model with the effluent of the internet intelligently filtered from
the pretraining data by LLM and human curation, and much more effort to include digitised archival sources and the entirety of books and high quality media transcripts. I imagine it would yield far better baseline quality outputs with much less than current “requirements” for (over)correction with ultimately disastrous RLHF masking. |
|
Or one tuned with every fiction novel ever written, along with every screenplay.