Hacker News new | ask | show | jobs
by ced 2777 days ago
You don't have "missing rows". For time series A you have a time point at 12:01 AM, a datapoint at 12:02 AM, and another datapoint two weeks later at 5:04 PM. For time series B the times are different. You need some notion of whatever state the physical system was in at any given time.
1 comments

Yes, I understand this as the "last value recorded" concept in my comment. KDB+ supports this with "asof" joins. Others can just do it by scanning a wider time frame or the entire table.

Null gaps in a columnstore can be skipped over basically instantaneously and usually are just zone map/index lookups. Again I question how common this query is and whether it's really worth limiting yourself to a special TSDB because of it.

> Others can just do it by scanning a wider time frame or the entire table.

"Scanning the entire table" for every request to have the last value recorded is rarely a practical option.

> KDB+ supports this with "asof" joins.

> [...]

> Again I question how common this query is and whether it's really worth limiting yourself to a special TSDB because of it.

KDB literally markets itself as a time series database. What's the point you're making again?

I think TimescaleDB lacks an "asof" function for now, but it makes up for it by having the full power of PostgreSQL for other stuff. Regardless, Time Series databases like KDB and TimescaleDB are useful.

EDIT:

it looks like TimescaleDB recommends using

  ORDER BY time DESC LIMIT 1
to get the most recent value for any particular set of sources that you're SELECTing over, which would use indices and be reasonably fast.
> KDB literally markets itself as a time series database

kdb+ also markets itself as OLAP/OLTP as well.

A lot of these guys market themselves as a "time series database" because kdb+ do and they want to be compared with kdb+ by people who haven't used kdb+ (but might be considering it).

Distributed relational column-oriented databases are best at large data volumes and OLAP queries. KDB+ is one of those, even though they call it a TSDB in marketing terminology because of its FinTech customer base.

TimescaleDB is not a TSDB, it's an extension to add automatic partitioning to PostgreSQL tables. Timescale helps Postgres get more performance, but it does not give you the full capabilities of a real distributed column-oriented system. If you must use PostgreSQL though then it's a good compromise.

The query you posted does not match the discussion about the last value at a specific instant in time, only the last value ever recorded in the table for that key.

> Distributed relational column-oriented databases are the best at large OLAP data volumes and queries. KDB+ is one of those, even though they call it a TSDB in marketing terminology because of its FinTech customer base.

You're mistaken about Kdb's relational features. Kdb was designed as a time series processing engine using arrays (columns). Column storage doesn't have anything to do with whether a database is relational or not, and Kdb wasn't originally any more relational than the language Erlang is.

I never said relational is related to column-storage.

kdb+ has SQL semantics and relational queries, but it's a combination of the q language integrated into a database so sure, it's a superset of a relational database. Perhaps we disagree on what relational means.

My point was that the current relational features of Kdb didn't exist originally (they were grafted on later) so it's not "marketed" as a TSDB, but it is in fact a TSDB marketed as a relational DB.

The definition of relational is very precise, whether you use the domain calculus, relational calculus or relational algebra. Wiki has a good summary of what must be natively supported by a database system to be relational: https://en.wikipedia.org/wiki/Relational_algebra

If you don't implement this at the transaction log level, but implement it via emulation at the output level, you can't make full relational guarantees, so these operations are fundamental to database design.

> Kdb wasn't originally any more relational than the language Erlang is.

To wit: SQL wasn't originally any more relational[1] than the language Erlang is.

[1]: Codd, E. F. A Relational Model of Data for Large Shared Data Banks -- Communications of the ACM, Vol. 13, No. 6, June 1970, pp. 377-387

I'm not sure what value such a statement brings the world.

I was drawing a comparison between the Kdb runtime and the Erlang runtime, because the OP seems to be conflating the ability to emulate relational features at the application level with first-class support for relational semantics as in a relational database. Support for relational semantics can be emulated via programming languages with sophisticated runtimes like Erlang, but I wouldn't classify any database supporting a Turing complete language with a sophisticated runtime as a relational database.
In our case, "last value" isn't good enough. We do interpolation, and use compression algorithms for which interpolation minimizes reconstruction error.