|
|
|
|
|
by whenisayUH
5221 days ago
|
|
This is definitely a hot area, but unfortunately, it is also becoming the thing everyone wants to be attached to.
And so the term is becoming increasingly meaningless. It's 2012s "location-based services" or "gamification" or "cloud" (wait, that's still hot). That said, I suspect big data (at least as I think I understand it) has more legs. But defining what it is is important else it becomes yet another buzzword. Are compete.com and quantcast big data? Is eBay who analyze terabytes of user meta data "big data"? Is SeatGeek big data? Is Twitter big data? Just because you have a potentially large database of stuff doesn't mean you are big data. Hopefully the term comes to mean something but right now, I fear it does not. |
|
Some of the problems: - It can take you longer to transfer off the data than your data acquisition source will allow you to store it there for. - Even if you could transfer the data off, now you have the problem of storing it on your site and distributing it intelligently among processing nodes. - Even if you could solve both of those, the projected power costs assocated with that scale of data are infeasible.
Most of the talks I see and papers I've come across seem to be focused on better scheduling and more experiment/gather-side filtering based on what you are planning to do with the data. But take this with a grain of salt, as I'm a compilers guy, so I just see the systems stuff secondhand and only know enough to talk about the languages-related issues with people who work in this space for real.