Hacker News new | ask | show | jobs
by haldujai 1187 days ago
I'm not sure that this would be as useful as one might think at face value. When you stretch out the training corpus like that you're going to have more noise/inaccuracies/refuted facts then you will have correct information.

It's also unclear how useful full scientific articles are, Microsoft/PubMedBERT interestingly showed PMC abstracts was better than full text.