| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bluestreak 2267 days ago

Good summary, thank you.

- we will extend SIMD to where clause, keyed aggregations, sampling, ordering, joins etc. It is a matter of time.

- do you mind elaborating on how we screwed up date and time?

- what makes you think we are never going to compete on performance with kdb+?

2 comments

jnordwick 2267 days ago

- nano are important for keeping ordering. while you may return results in insert order it is nice to have them so any other operations done in them outside the db you can retain (or recreate) that ordering. in financial systems nanos have become a sort of defacto standard for this. for examsple, all our messaging timestamps at places i've worked are always nano for anything written in the last 5-10 years.

Also when you are trying to do calculations on high-frequency data (tick, iot) it ruins your ability to take meaningful deltas (eg, arrival rates) since you get a lot of 0s and 1s for the time deltas. Its difficult to take weighted averages with weights of 0s.

the issues solving that (if you really need a wider range) are easier to solve that having to force everything down to micros and creating ways around that. (eg, kdb uses multiple date, time, and timestamp types and it doesn't use the unix epoch since it isn't very useful for tick, censor, or any high-frequency data i've seen).

better than a double that some systems still use.

-kdb's secret sauce that people don't seem to understand is its query language that more naturally fits the time series domain. It isn't really a database as more it is an array language with a database component. (eg, try to write an efficient sql query that calculates 5 minute bars on tick data).

I actually like Java too - I've written or worked on a couple trading systems written in core java. just get good devs who understand what it means to write zero gc code, abuse off-heap memory, and understand what hotspot can intrinsify. If you can stay in the simd code for all the heavy lifting loops (filters, aggregates, etc), I don't think java will be an impediment.

I think you have parts going in the correct direction, and you seem to have good experience from looking at the bios. Nothing really un-fixable (or un-addable) in what I saw glancing at your docs. I did bookmark you to see how the db goes. Will prob check out soon.

link

patrick73_uk 2267 days ago

I'm a QuestDB dev, with regards to nano timestamps, we dont use a nanosecond timestamp because its not possible to be accurate to that resolution with current hardware. However, on a single host the nano second clocks are precise and monotonic, they would be useful to maintain order. I think they do make sense and we will have to look into providing timestamps to that resolution.

link

salmonlogs 2267 days ago

"its not possible to be accurate to that resolution with current hardware"

Are you referring to the clock precision of consumer grade hardware here?

In my experience the vast majority of financial time series data is reported in nanoseconds. The data providers, vendors, exchanges and data brokers absolutely have hardware capable of measuring timestamps in nanoseconds.

The accuracy doesn't have to be to 1ns of resolution to warrant measuring in nanos - even to the nearest 100ns is a useful and meaningful improvement beyond micros.

link

bluestreak 2267 days ago

We are going to add a new type in the future to support nanos! Sorry for the confusion.

link

jnordwick 2266 days ago

can you please contact me to jnordwick@gmail.com?

link

binomiq 2267 days ago

KDB works around this by storing the nanos in 64 bits but only for a particular time range.

1707.09.22D00:12:43.145224194 (max negative) to 2292.04.10D23:47:16.854775806 (max positive)

With 0 at 2000.01.01D00:00:00.000000000

link

smabie 2267 days ago

The reason why you cannot ever compete with kdb+/q is because the database and language run in one address space. Your benchmark gets around this problem by using the built in sum() function, but kdb+/q can just execute arbitrary code and never suffer a performance penalty. Unless you plan on integrated a high performance programming language into your DB, it simply will not be possible to ever meaningfully compete on the effective total time of complex queries.

I, of course, am not disparaging your work, the performance numbers are very impressive!

link

patrick73_uk 2267 days ago

I'm a QuestDB dev, data on a QuestDb is also stored in a single address space and SQL queries are compiled into objects that run in that address space. However, it is just SQL not a bespoke language. A future possibility would be to allow queries in java or scala.

link

smabie 2267 days ago

Implementing the ability for QuestDB to dynamically load jars would be really cool. And if you exposed an interface to directly communicate with the Db, you could get rid of the SQL parsing overhead as well. This would also allow QuestDB to function as an app engine of sorts, just like kdb+/q. I see real value in that for latency sensitive financial applications.

link