Y
Hacker News
new
|
ask
|
show
|
jobs
by
throwup238
565 days ago
You’ll need other heuristics for ToC and indices but headers/footers are easy to detect via n-gram deduplication. You’ll want to figure out some rolling logic to handle chapter changes though.
1 comments
ellisv
565 days ago
Headers/footers are also positional.
link