Hacker News new | ask | show | jobs
by koakuma-chan 133 days ago
You can download the books and run them through a tokenizer. I did that half a year ago and got ~2M.