Hacker News new | ask | show | jobs
by throwup238 565 days ago
You’ll need other heuristics for ToC and indices but headers/footers are easy to detect via n-gram deduplication. You’ll want to figure out some rolling logic to handle chapter changes though.
1 comments

Headers/footers are also positional.