Hacker News new | ask | show | jobs
by nkingsy 918 days ago
I was under the impression that the data google has isn’t too valuable for ai training, as quality is so important.

If textbook quality data is needed, then we are basically limited by the current best LLM’s ability to create synthetic textbooks.

Or perhaps this is a path Microsoft is trying (and presumably openai) due to a lack of good non-synthetic data.

1 comments

Google Books would probably be useful, although I don't know if they're able to take advantage of it.