|
|
|
Ask HN: How to remove Ads from a downloaded HTML file to output an ad free file?
|
|
1 points
by suramya_tomar
592 days ago
|
|
Is there a tool/script that will allow me to filter out ads from a page when downloading it using curl. (Similar to how uBlock Origin works for a browser). Basically, what I am doing is downloading a snapshot of a site using curl. But the sites have advertisements in them which I want to filter out. So is there a tool that will let me do that from the command line so that the output file doesn't have ads in it? In short, I want something like uBlock Origin but for html files that I will be converting to PDF's or epubs. Something like: curl https://www.google.com | AdRemover.sh | htmltopdf Most of the solutions I found require you to update the /etc/hosts file to stop showing the ads but would rather avoid that if possible. |
|
I found https://github.com/ArchiveBox/ArchiveBox/ which is a self hosted web archiving system. It covers most of my usecases (and I can extend it for additional functionality) so I am going to set this up and try it out.
Thanks all for the help.