| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cevian 2873 days ago
	To put a number on this claim. TimescaleDB at this point can handle up to 100 TB of data.

3 comments

wenc 2873 days ago

For pure time-series, that is huge.

The bulk of time-series data is floating point numbers, which are fairly tiny in terms of storage.

100 TB is a lot of time series.

link

comboy 2873 days ago

Where does the limitation come from? Or is it just the amount of data that it's been tested on?

link

mfreed 2873 days ago

TimescaleDB clustering is currently limited to a single primary with multiple read-only replicas (which provide both HA failover and scaling concurrent reads).

So the limitation here is the storage capacity available for the nodes, and so the above reflects the amount of data we've seen in use in various RAIDed or network-attached storage environments (like the cloud).

One interesting thing here is that you can "elastically" add a new disk to a existing hypertable, and new writes will be automatically load balanced across the new disk. (In Postgres speak, we support multiple tablespaces in a single hypertable, and we allow you to dynamically add a new tablespace to an existing one: https://docs.timescale.com/api#attach_tablespace )

link

cevian 2873 days ago

The latter. Because of the table partitioning there is not much dependency between old and new data so we've seen consistent performance as you add more and more data.

link

qaq 2872 days ago

It's not only about size but about speed column stores that store compressed data and process data in blocks using SIMD perform significantly faster on many query types.

link

cevian 2872 days ago

While it's true column stores perform better on single-column aggregates, they perform worse on multi-column operations, thresholding queries, and other types of complex analytics that one often sees on time-series workloads. We have published benchmarks on some common column stores that show these tradeoffs.

link

wenc 2872 days ago

This is slightly outside the scope of what TimescaleDB handles, but conceptually is it possible to create both row and column indices, and have the query engine hit one or the other depending on the query?

SQL Server sort of does this with traditional indices and columnstore indices. Indices are derived data structures that represent a view of the original dataset, so in theory it shouldn't matter if the original data is stored in rows or columns.

link

qaq 2872 days ago

Would be interesting to see benchmarks vs performant options e.g Vertica and ClickHouse

link