Hacker News new | ask | show | jobs
by version_five 1158 days ago
Lots of reasons this isn't universally true - it only works if you know enough about the data to simulate it, and your stuck within some distribution + human guesses space that's not all encompassing.

The easiest counterexample is training LLMs, how are you going to synthesize useful language examples if you want more. Some version of this is true for most applications.

1 comments

Yeah the issue is you can generate data, but it won’t be good data. Training over random strings won’t make you learn language, but it’s technically data.