| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by logophobia 2897 days ago

He's probably refering to a combination of features that "nosql" databases have.

* Schema-on-read: Makes it easier to ingest large amounts of data, and then do ad-hoc exploration. The schema is only determined when reading the data, which is a bit easier for one-off data exploration, you determine how to interpret the data when actually using it. Not appropriate for production systems though. For example, a customer gives you a few TBs of data, you dump it on hadoop, and query it with spark. It would slow you down if you first have to convert it to a relational schema. Again, only good for one-off stuff.

* Most SQL databases have column limits, so if you have a very large amount of features, I'd imagine you'd run into these limits.

* Scalability. Feature engineering is very parallelizable, most normal SQL databases (excluding stuff like cassandra) aren't trivial to scale.