if you really want to rake it in, serve, at static speeds (meaning instantly, I swear, boot a ramdrive (Tmpfs) and serve static html from nginx all from RAM), text versions of the top 10,000 web sites. there is so much crap on most sites. re-crawl hourly.
monetize via Google adwords.
EDIT: I'm not sure why I'm being downvoted. I am not suggesting serving PDF's. I am suggesting serving tiny text renders of top sites, that otherwise are much too bloated.
the hard part is getting the text and layout right. many people read many sites for the text IMO.
So I am suggesting you make an all-text version.
As an example, the front page of the New York Times right now, copied into Microsoft Word, is 2504 words. When I save from the word I copied into into .txt - I get a 16.4 KB file.
If I try their competition, the Washington Post, I get 237 KB. If I try the Wall Street Journal, I get 938.15 KB -- nearly a full Megabyte. (This is actually more what I was expecting - I'm impressed by the Times.)
Suppose someone desperately wants to glance at the Wall Street Journal from a poor connection where they barely get data. The difference between 12 KB and nearly a megabyte is huge. Its the difference between 4 seconds and 312 seconds: 4 seconds as compared with 5 full minutes.
So there is a large need in my opinion for such a service in case someone desperately wants to see a text render. Preserving any formatting at all, helps hugely.
For clarity can you edit your comment to add cozzyd (the OP you mention) - I am sometimes sarcastic but not in this case. I'll then delete this comment.
1) Copyright: completely re-serving the complete content of the top 100 sites with your own ads does not fall under fair use and would almost certainly be a magnet for lawsuits.
2) Distribution: how do you find your niche of people with poor internet connections and get them to use your mirror instead of whatever site it is they want to read?
no clue on 2. for 1, you could have it be "opera mini/turbo as a service" so that you are arguing you are just shifting the viewer to the site, but it's still the user doing the viewing. it helps if you preserve any text ads on the site (or links, with alt-text, given you're probably not doing images. you could also replace images with a grainy black-and-white very low-fidelity version, this also shifts most ads on the original site, without adding hugely to your footprint.) To be honest I also thought perhaps javascript etc could be run, so that the heaviest sites of all are still downloaded and then turned into text versions. In many cases that can let someone browse a site that is otherwise incredibly slow.
This isn't legal advice, just the approach I would use off of the top of my head. I agree with you that it's hard. with the framework "opera minifier/turbofier as a service" it could work, though. Like a remote browser. (in a VM). Like, present it as "lynx as a service." (Lynx being an old terminal-based text browser.) Something like that, anyway.
Doesn't Opera Mini or Turbo already provide this sevice? Perhaps add PPMD proxy text compression with an English dictionary with a JavaScript browser plugin on top of that. You can't get more efficient than that
I tried to extract text from a pdf that already has searchable text, which can be copy-pasted. This should be the easiest task of all but it made mistakes in every second word.
Then I asked the website to make a pdf into a word-file. It just inserted the whole pdf as a picture in word.
monetize via Google adwords.
EDIT: I'm not sure why I'm being downvoted. I am not suggesting serving PDF's. I am suggesting serving tiny text renders of top sites, that otherwise are much too bloated.
the hard part is getting the text and layout right. many people read many sites for the text IMO.
So I am suggesting you make an all-text version.
As an example, the front page of the New York Times right now, copied into Microsoft Word, is 2504 words. When I save from the word I copied into into .txt - I get a 16.4 KB file.
By comparison, when I put the site into a Page Size Checker -- http://smallseotools.com/website-page-size-checker/ -- I get 214.23 KB. That is impressively small, and it's a fast page.
If I try their competition, the Washington Post, I get 237 KB. If I try the Wall Street Journal, I get 938.15 KB -- nearly a full Megabyte. (This is actually more what I was expecting - I'm impressed by the Times.)
Suppose someone desperately wants to glance at the Wall Street Journal from a poor connection where they barely get data. The difference between 12 KB and nearly a megabyte is huge. Its the difference between 4 seconds and 312 seconds: 4 seconds as compared with 5 full minutes.
So there is a large need in my opinion for such a service in case someone desperately wants to see a text render. Preserving any formatting at all, helps hugely.