|
|
|
|
|
by kodablah
3952 days ago
|
|
"With a relational database the complexity is hidden" That is my main issue. I use Cassandra over relational firstly for its linear scalability and multi-master-esque HA. But even ignoring those, I understand exactly what is being scanned and what is not, I don't have to fight with an optimizer at runtime based on several parameters. |
|
For example, when I started in Big Data, in less than 3 weeks I was able to optimize some batches just because I read the documentation of the framework used (PIG in this case) and read a small part of the source code to dig deeper. And it was not some touchy optimizations: I used in-memory joins and reduced the number relations in the scripts to reduce the generation of Hadoop jobs (which led to batchs 4 times faster).
There are often problems with our HBase database because it’s often overloaded (I’m not an IT operator so I can’t give more details) and no one really masters this database whereas it’s in production since 2014.
I do understand that in some cases a NoSQL database is mandatory and like you I like to understand what I’m doing. But:
- I’m not working in Silicon Valley
- Most of my co-worker are not geeks (and I respect that)
- It's VERY hard to find guys with real Big Data or NoSQL skills (this comes from a French technical recruiter)
So, if the geek part of me loves Big Data and NoSQL, the rational part prefers using well known technologies. If NoSQL and Big Data becomes mainstream and more known then the rational part will love them too.