10 or 20 years ago there wasn't nearly as much readily available data to be mined. Today even moderately high traffic sites generate GBs of log files a day, not to mention the enormous quantity of high value data available through various APIs.
you don't actually need all of the traffic to make meaningful conclusions. Tracking a statistically sound random sampling of user sessions provides most of the benefit for pattern analysis uses.
you've actually got the processing power to do interesting things. I've currently got a 250M record database in my domain of interest - a few years ago, crunching on this database was prohibitively time expensive but now it flies, without even getting into what it means to be able to run stuff on EC2 with arbitrary power... that's direct experience with the same database btw, not supposition based on two different databases. Next, consider how much more data is being generated... it should not be difficult to believe that drawing interesting conclusions from data is and will continue to be interesting.