Y
Hacker News
new
|
ask
|
show
|
jobs
by
stop50
917 days ago
Most ebook formats are some sort of container + metadata + html +css. So you need only the files, extract the html and feed the text in it to the statistic model.