Hacker News new | ask | show | jobs
by stop50 917 days ago
Most ebook formats are some sort of container + metadata + html +css. So you need only the files, extract the html and feed the text in it to the statistic model.