| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jnazario 4100 days ago

i concur.

while it's likely i've implemented my SQL horribly, i can say that after a few days of millions of hits per day in my time series database, searches became horribly slow and interactive became unresponsive. in my case it was a set of botnet sinkholes that i was recording.

so yes you can, but on the high volume side of things (for some cutoff of "high volume") it falls over pretty dramatically and continues to degrade.

time series data has a few unique properties that a full SQL solution doesn't optimize around, like write-once/read-many. a purpose-built TSDB solution is built for this.

2 comments

mborch 4099 days ago

The article states the opposite: that it's write-heavy.

The difficulty in managing time series data is that you need to do roll-ups and generally avoid doing the same work twice - that is, read the same rows over and over again.

If you're doing the same work over and over, it's always going to be slow. Don't do that! InfluxDB could presumably be built on top of PostgreSQL. It just manages the data lifecycle. But that would be a polyglot mashup project then and not something you could sell to VCs.

link

gtrubetskoy 4100 days ago

Yeah, but is that really TS-specific? High volume is high volume, regardless of the type of data, and to address this you may need to use something like a fast key-value store.

link

jnazario 4100 days ago

maybe i could have shoehorned it into a KV store and done range queries, but again this was stuff like "timestamp, srcip, srccc, srcasn, eventid". the main vector is a timestamp, and every query has a timestamp range associated with it. these are written once, never updated. other data stores don't optimize for those parameters.

link