Hacker News new | ask | show | jobs
by ryanbooz 1465 days ago
(blog author)

Thanks for the feedback! Out of curiosity, if the data you're trying to analyze doesn't have time as one of the critical components, what kind of data is it?

Always helpful to learn a bit more.

2 comments

Is time series the right answer for anything with a time dimension, or is it mostly for things where time is THE critical dimension? For example, business intelligence applications care about time, but they also care about a whole bunch of other stuff as well (I think with at least as much importance)--is timeseries the right answer for this use case?
Anytime you’re interested in seeing how things change over time, that’s time series. It’s a very big category of use cases.
Sure, but analytics is sometimes change over time, and other times change over some other dimension. Presumably if time is just one dimension among many, then timeseries is probably not the right fit in general?
As with anything else, you can approach specific problems in many different ways.
timeseries is usually specific to use cases when you data represents some signal over time, like temperature reading, stock price, etc.

so you need 2 components: timestamp and signal reading, in this case all specific timeseries analytics apply: sliding/tumbling window, avg per window, smoothing, autocorrelation and all other techniques from Digital Signal Processing/timeseries analytics.

Your regular monthly Sales data of ACME Corp by product category and storeId - this is not timeseries, just general BI

(NB - post author)

Great definition! Having worked for years on both energy and IoT applications, the argument here is that your "monthly sales data" is likely being aggregated from your time-series data (sales transactions over time). If you store the transaction data in a database like TimescaleDB, then continuous aggregates provide the straightforward method for keeping that aggregated, monthly sales data up-to-date. :-D!

That's very zen, but ultimately it doesn't answer my question.
Well, I could be more opinionated, but even in very specific situations, reasonable people disagree about the best way to model data, and I don't really know a lot about your specific problem-space or situation.

My personal preference is to think of almost any changing measurement or event stream as a time series. See also the reply to a sibling comment.

time is usually in the table, but not always in an analytics query.

I'm building https://luabase.com/. A good example would be summing transactions by the ethereum contract address.

Totally agree. Time is a primary component, but it might not always be the primary query parameter... at least once the data is aggregated.

In the example you gave, I'd assume that you wouldn't run a query over billions of transactions to do a sum. (obviously indexes would be part of reducing this number at query time). I would think you'd probably want to aggregate the sum per hour/day of all addresses and then decide at query-time if you need to sum all transactions for all time or within a specific range. Whenever you need to constrain the query based on time, you're still using the data like time-series, even if the final result doesn't have a date on it. And whenever you're doing the same aggregate queries over and over, that's where Continuous Aggregates can help!

For example, using the (transaction??) timestamp to efficiently store the data in time-based partitions (TimescaleDB chunks) unlocks all kinds of other functionality. You can create continuous aggregates to keep that historical aggregate data up-to-date (even if you need to eventually drop or archive some of the raw transaction data). With 2.7, you can create indexes on the views in ways you couldn't before which speeds up queries even more. Chunks can be compressed (often 93%+!!) and make historical queries faster while saving you money.

So in that sense, time is the component that helps unlock features - when time is an essential component of the raw data, but the query-time analytics don't have to specifically be about time. PostgreSQL and TimescaleDB work together to efficiently use indexes and features like partition pruning to provide the performance you need.

BTW, I'm not sure if you saw the post and tutorial we just released last week showing how to analyze transactions on the Bitcoin Blockchain or not. [1][2] Similar use-case and not all tied to time-based queries only. There are also other companies currently indexing other blockchains (Solana for instance) that have had really great success with TimescaleDB (and it gets even better with TimescaleDB 2.7!)

Thanks!

[1]: https://www.timescale.com/blog/analyzing-the-bitcoin-blockch...

[2]: https://docs.timescale.com/timescaledb/latest/tutorials/anal...

We see those types of queries commonly in TimescaleDB. And, for example, both compression and "horizontal" scale out has ways where you can optimize your code for these types of analytical queries.

More concrete, we see a lot of web3/crypto use cases, and making a wallet ID, NFT name, or ticker as a top-level considerations.

E.g., use your contract address as the segmentby field for compression.