Maybe you could use:
http://commoncrawl.org
I kinda remember a site that saves html in tables to be queried.