|
|
|
|
|
by avifreedman
2907 days ago
|
|
Many of the alternates (including linux cli stuff) that are much faster require a re-thinking of attitude, don't work where there are tens to hundreds of people submitting queries, or require different skills. It's tragic to think of all of the computrons and watts wasted with Hadoop-ish stuff (map-reducing without filters, Java itself for most implementations) - but still I wouldn't recommend to most CIOs they replace Hadoop in all or maybe even most cases, even for few-TB data sets and smaller. Both because of familiarity with querying and the solidity of running a multi-tenant system. But I do recommend that they switch to MapR [c++ core and a passable central FS for unix-based super fast queries] if they're concerned with efficiency. [For context, in my day job we do multiple clusters of millions of network traffic summaries/sec and are often replacing Hadoop, or more recently, ELK, as people tried to use them for that use case. All well >>> will fit in ram. We have our own in-house column-store + streaming combo db done in go/c/c++ that started as clustering fastbit.] |
|
I doubt the point of the article was to suggest that linux cli stuff would scale to hundreds of users on the same host, but, if each of those users has a host of their very own, such as a laptop, the model could scale very well indeed, for small enough datasets.
> I wouldn't recommend to most CIOs they replace Hadoop in all or maybe even most cases, even for few-TB data sets and smaller.
Well, as you point out later, regarding familiarity, once it's in, it's probably too late. What about for a new implementation?
In answering this question, don't get too hung up on a literal interpretation of "single" server being exactly one. For example, a traditional RDBMS with one or more replicas (for performance, redundancy, or both) would still fall under the single server model. Really, it's about the non-distributed-computing option.
> if they're concerned with efficiency.
The fact that this is an "if" (and I do know that it is, even for startups) is bewildering to me, even more so in the context of distributed architectures where scaling is less linear the more data that has to be shared.