|
|
|
|
|
by wongarsu
1771 days ago
|
|
> Rather than producing output data and then exiting, like a Hadoop job, a continuous MapReduce program continues to update its results incrementally and in real time as new data is added So keeping track of min/max/average as you add new data is now "continous MapReduce"? Don't get me wrong, a data platform that ingests data and computes useful user-defined aggregates from that sounds useful. But this article feels like an attempt to position that as some kind of incredible industry-leading insight that is a novel take on $buzzword, when it really isn't. |
|
Yeah, the article is an odd spin on what they're building.
> min/max/average as you add new data
Or a 2D kernel density estimate for your dashboards, a real-time view of 3-neighbors in a graph (nodes+edges definition) sized by log1p(request frequency), .... I find it way easier to write a few custom incremental primitives to piece together into that kind of algorithm than to write such an algorithm from scratch.
I'm not crazy about a general-purpose framework/product that tries to allow incremental updates of AllTheThings™ -- my experience thus far suggests that getting it to do what you want (or perform reasonably) on your own data will require enough kludges that you would have been far better off writing the WholeDamnedThing™ yourself.
If they do only support min/max/average and other simple transforms then that's probably not great; they'd be competing directly with something like QuestDB, which is a phenomenal product I'm leaning toward more and more. You don't need millisecond view update times if you can query the whole db in milliseconds.