Hacker News new | ask | show | jobs
by sparkman55 4278 days ago
Depending on how 'huge' your timeseries are, you might be pleasantly surprised with Postgres. Postgres scales to multiple TB just fine, and of course the software can be easier to write since you have SQL and ORMs to rely on. It's also an incredibly mature and stable software package, if you're worried about future-proofing.

Some (constantly-growing) timeseries can be stored on a per-row basis, while other (static or older) timeseries can be stored in a packed form (e.g. an array column).

I find that most of the time, "Big Data" isn't really all that big for modern hardware, and so going through all of the extra software work for specialized data stores isn't really all that necessary. YMMV, of course, depending on the nature of your queries.

2 comments

>I find that most of the time, "Big Data" isn't really all that big for modern hardware, and so going through all of the extra software work for specialized data stores isn't really all that necessary. YMMV, of course, depending on the nature of your queries.

I totally agree. Most of useful "big data" is time-series data, and they aren't all that huge compared to images/videos/etc.

That being said, I think the reason to adopt something like Hadoop/MPP engines is not for storage but ease of querying: while Postgres can handle storing terabytes of data, joining two terabyte-scale tables can get a little iffy. This gets even more complex if you start packing data into array columns for space efficiency.

There is an argument to be made that historical/archival data aren't all that useful and thus do not need to be analyzed: that was definitely my assumption coming from finance. However, I've been surprised how far back some of our customers at Treasure Data go to mine insights from data.