Hacker News new | ask | show | jobs
by wvenable 3166 days ago
> Sparse heterogeneous data is often the type of data stored in NoSQL dbs.

I still can't imagine what sparse heterogeneous data exists in the world that makes sense to store. Any type of querying or processing requires some kind of structure (even if implicit in the code) which you can just put in different table structures.

You have to make sense of data to process it and that kind of implies a structure, doesn't it? Am I missing some obvious example of heterogeneous data?

2 comments

Customer Analytical Record / Feature Engineering Store

One customer column, tens of thousands of attribute columns.

If you need everything about a customer it is a single, O(1) fetch operation which makes it perfect for driving chat bots, call centres, websites, operational decisioning engines, dashboards etc. Almost every large company will have one of these.

You can't really do it in relational systems properly because (a) you hit the column limit, (b) often it is sparse i.e. lots of NULLs everywhere, (c) you need this system to be distributed since it often gets a lot of load.

What would the attribute columns consist of? My experience has been with named columns defined individually by humans, of which I've never seen more than a few hundred; how do you get tens of thousands? Are they a different kind of thing?
Most companies who do it purely by humans can easily get into the thousands of attributes. Have seen it many times before where you hit the column limit of a SQL database.

But where you get into tens/hundreds of thousands is when you have machine learning models automatically selecting and storing important features from the data.

Tick (market) data is another good example of this. A given 'Tick' is just an event that can have any of up to thousands of different attributes set (often just a handful).