|
Relational databases represent a column/row-oriented architecture. Data historians are a specialized, non-relational, time-oriented architecture. Using time as a key in a relational index implies that only ordering is important, but that is not the case. Distance between points in time is extremely important because time operates on a continuous 1d line and data points are represented at varying distances between each other on that line. Data historians are architected to both preserve this temporal relationship and take advantage of this by eliminating duplicate data, employing temporal compression techniques to be able to store millions of readings per second for years worth or data. > From reading your website My website doesn't have much to do with this because Sentenai isn't a time series database system. I did, however, spend most of my time in research working on temporal data systems, and have been fortunate to collaborate with or learn from researchers who have spent decades solving the unique problems that temporal data presents. What you might consider uncommon for your use cases is extremely common in manufacturing, defense and other areas. There's a decades-old industry around database systems that handle time natively. And while many support SQL as a lingua franca, and some are column stores, they're not relational by any means as they either extend SQL to support time, or limit non-temporal joins to ensure performance. StreamBase, Kdb, Aurora and many other specialized architectures exist because one size does not fit all. Michael Stonebraker, whose work has included StreamBase, Vertica, Tamr, Postgres, Aurora, and many others, famously published this paper about the very problem: https://cs.brown.edu/~ugur/fits_all.pdf . Further reading that might be illuminating: 1. http://cs.brown.edu/research/aurora/vldb03_journal.pdf
2. http://www.cs.rochester.edu/u/james/Papers/AllenFerguson-events-actions.pdf
3. https://books.google.com/books?id=BK6oCAAAQBAJ&pg=PA9&source=gbs_toc_r&cad=4#v=onepage&q&f=false (excerpt)
|
Every issue mentioned in the abstract/intro (which are meant to motivate the paper) seems like it can be solved as an add-on to existing application databases (albeit with their most recent developments/capabilities in mind). The very description of HADP vs DAHP systems seems silly, because it's just a question of write load, and that's fundamentally only solved with batching and efficient IO, or if you give up durability, it doesn't seem inherent to the data model. There's also assertions like:
> Moreover, performance is typically poor because middleware must poll for data values that triggers and alerters depend on
But like, postgres though, you're free to define a better/more efficient LISTEN/SUBSCRIBE based trigger mechanism, for example, you can highly optimized code right in the DB... Thinking of some of the cases called out in the paper here's what I think in my head:
- Change tracking vs only-current-value -> just record changes/events, as far as tables getting super big, partitioning helps this (timescaledb does this)
- Backfilling @ request time -> an postgres extension could do this
- Alerting -> postgres does have customizable functions/procedures as well as LISTEN/SUBSCRIBE. The paper is right (?) about TRIGGERs not scaling then this might be the most reasonable point.
- Approximate query answering is possible with postgres with stuff like HyperLogLog, but the paper is certainly right in that it is not implemented by default.
Maybe I'm mistaking the extensibility of postgres for the redundancy of the paradigm, akin to thinking something like "lisp is multi-paradigm so why would I use Haskell for it's enhanced inference/safety".
I'm still reading the paper so maybe by the end of it it will dawn on me.