Hacker News new | ask | show | jobs
by maoserr 1527 days ago
I wrote something similar: https://github.com/maoserr/epublifier

It's more geared towards longer web novels with 50+ chapters (I've used it on novels with 500 chapters before). Instead of opening each page as a tab, it fetches chapters from a Table of Contents page.

It was written for jnovel/cnovel/knovel site, but it can handle any generic page that has a list of links.

2 comments

I also wrote an alternative solution (not a public repo), but I found that relying on site maps and other link lists generally gave unsatisfactory results. Instead, my solution navigated as a user and actually used next chapter links. While that slowed it down (+ 10 seconds between requests to be polite), it could handle very large books, with the largest I used being 700+ chapters at the time (5000 pages).
This is almost the same aproach I used for Bloxp[0]. I have some common Previous Post link markups and I try to navigate from the last post in a blog, one by one, to the first. I also allow to manually indicate the HTML markup to use for crawling a given blog, in case it is not matching any of the common ones.

I uploaded the site 10 years ago (at first I did it because it was useful to me) and I have made almost no changes since then but many people still use it as a simple way to export a full blog into an ePub.

[0] http://www.bloxp.com

Yea I also had that idea before, but I didn't want to maintain a bunch of different "next chapter" finder logic.

But I do agree it would be a more reliable way of doing things.

I've written similar scripts to do this, but lncrawl replaced most of them https://github.com/dipu-bd/lightnovel-crawler/