|
|
|
|
|
by potatoyogurt
2899 days ago
|
|
They're well understood by anyone who has used these technologies professionally. I probably should have been ore precise with my language, though. It's really a grey area as to where the boundary is and it depends a lot on your specific application. But as a general rule of thumb, my feeling is that for tens to hundreds of GB, you should consider it. And for TBs or more, you almost certainly want to be doing something distributed. Hadoop isn't necessarily the best option then, but it's a powerful tool. I don't know if there's any resource out there that really goes deep into the tradeoffs involved though. There probably is, given how popular the subject is, but I'm not aware of one. The problem with the article is that if it's for a general audience that doesn't understand the tradeoffs of a system like Hadoop, it really paints a picture that it is just a bad, slow tool. It barely acknowledge just how rigged the comparison is at all, aside from mentioning that you might need something like Hadoop for really big data in the conclusion, while it is peppered with unnecessarily snide comments about Hadoop that will probably be more memorable. I think it is liable to leave readers more confused about the tradeoffs involved after reading than before. |
|
We need to substitute the word "professionally" with more precise terms when talking about our industry. Because if one was to read "professionally" as "at work", then your statement is absolutely false - both the lower bound and average amounts of critical thinking and caring in this industry are extremely low. Even ignoring people who obviously have no clue, I can still imagine Hadoop and other big data stacks being sanctioned by "professionals" in management for buzzword-generating reasons, and implemented by "professional" "engineers" for CV padding reason.