Hacker News new | ask | show | jobs
by jandrewrogers 3348 days ago
This observation applies to database engines generally. It is straightforward to explain: almost everyone with deep expertise in sophisticated database engine internals are contractually prohibited from disclosing anything about the design of such things. It is an industry steeped in trade secrets. Sophisticated database engines are littered with novel algorithms and designs that have never been published. Because basic performance superiority is a key market differentiator, database companies have invested heavily in computer science R&D for decades to get an edge they are loath to share. Sadly, academia is increasingly in the role of independently re-discovering what has been known for 10-20 years but treated as a secret.

The other big factor is that almost all open source database engines are the product of someone who is basically designing their first database engine. Prometheus is now on its third(?) redesign, and while you can see the growth of the designers' skill it is still a relatively naive design. This is not a jab at the designers, it just takes many creative iterations over many years to discover all the tricks that the experts know but nobody publishes. I probably spent a decade producing database engines that in hindsight were pretty mediocre but at the time I thought they were sophisticated -- I didn't know what I didn't know.

As a last observation, the other issue is that building a genuinely sophisticated database engine requires a commitment to writing an enormous amount of code before the barest database kernel can even bootstrap, on the order of 20kLoC of dense C++, never mind provide any database functionality. I've noticed that most open source projects need to see the payoff of code running and doing something minimally useful with much lower levels of investment. For database engines, getting to running code as quickly as possible compromises the design but the pressure to get to running code is understandable. The amount of time and effort required to design and build a state-of-the-art database engine goes far beyond what most people are willing to invest in what is essentially a hobby.

3 comments

> Sadly, academia is increasingly in the role of independently re-discovering what has been known for 10-20 years but treated as a secret.

This has been true of network programming for a long time as well. Private companies find novel ways of switching and routing packets, reducing latency, etc. and academia is left to pick up the pieces.

> This observation applies to database engines generally. It is straightforward to explain: almost everyone with deep expertise in sophisticated database engine internals are contractually prohibited from disclosing anything about the design of such things. It is an industry steeped in trade secrets. Sophisticated database engines are littered with novel algorithms and designs that have never been published. Because basic performance superiority is a key market differentiator, database companies have invested heavily in computer science R&D for decades to get an edge they are loath to share. Sadly, academia is increasingly in the role of independently re-discovering what has been known for 10-20 years but treated as a secret.

This makes an interesting case for software patents: what if database companies patented those insights instead of keeping them secret, and thus could keep others from using them for, say, 3-4 years? At the end of that time anyone could use those techniques.

This seems to me better than either trade secrets are the current absurdly-long patent lifetimes.

KDB+ is not 20kLoC of dense C++.
It's 1VLLoC of C, right? ;)