TimescaleDB clustering is currently limited to a single primary with multiple read-only replicas (which provide both HA failover and scaling concurrent reads).
So the limitation here is the storage capacity available for the nodes, and so the above reflects the amount of data we've seen in use in various RAIDed or network-attached storage environments (like the cloud).
One interesting thing here is that you can "elastically" add a new disk to a existing hypertable, and new writes will be automatically load balanced across the new disk. (In Postgres speak, we support multiple tablespaces in a single hypertable, and we allow you to dynamically add a new tablespace to an existing one: https://docs.timescale.com/api#attach_tablespace )
The latter. Because of the table partitioning there is not much dependency between old and new data so we've seen consistent performance as you add more and more data.
It's not only about size but about speed column stores that store compressed data and process data in blocks using SIMD perform significantly faster on many query types.
While it's true column stores perform better on single-column aggregates, they perform worse on multi-column operations, thresholding queries, and other types of complex analytics that one often sees on time-series workloads. We have published benchmarks on some common column stores that show these tradeoffs.
This is slightly outside the scope of what TimescaleDB handles, but conceptually is it possible to create both row and column indices, and have the query engine hit one or the other depending on the query?
SQL Server sort of does this with traditional indices and columnstore indices. Indices are derived data structures that represent a view of the original dataset, so in theory it shouldn't matter if the original data is stored in rows or columns.
The bulk of time-series data is floating point numbers, which are fairly tiny in terms of storage.
100 TB is a lot of time series.