| First of all, John Foreman is great. Read his book "Data Smart" and http://analyticsmadeskeezy.com/blog/ (disclaimer: I am in no way tied to John Foreman. Also, I work at a company that provides a data processing/collaboration SaaS...for big data! http://www.treasuredata.com) A quote from the OP: >If your business is currently too chaotic to support a complex model, don't build one. Focus on providing solid, simple analysis until an opportunity arises that is revenue-important enough and stable enough to merit the type of investment a full-fledged data science modeling effort requires. This is consistent with what we see in our customers. The use cases we see most with processing big data boils down to generating reports. Generating reports may sound really prosaic, but as I learned from our customers, most organizations are very, very far from providing access to their data in a cogent, accessible manner. Just to generate reports/summaries/basic descriptive statistics, incredibly complex enterprise architectures have been proposed, built by a cadre of enterprise architects and deployed with obscenely high maintenance subscription fees billed by various vendors. That's the reality at many companies. As bad and confusing the buzzword "big data" is, one good byproduct is that it has forced slow-moving enterprises to rethink their data collection/storage/management/reporting systems. Finally, I am starting to see folks do meaningful predictive modelling on top of large-ish data (in the order of terabytes). Some of them are our customers at Treasure Data, some aren't, but they are definitely not "build[ing] a clustering algorithm that leverages storm and the Twitter API" but actually doing the hard work of thinking through how (or if) the data they collect is meaningful and useful. And that's a good thing. |