Hacker News new | ask | show | jobs
by anon1m0us 2385 days ago
I'd like to see some real metrics. All the words are subjective and the technical ones are common to other databases as well. How many results can it return in how many seconds from a table with how many rows? Can it do full text indexing? Can it aggregate?

It feels like something from wallstreet, the language is almost pump and dump "I have this really great stock tip from this guy who made millions once, he's a legend. He says buy, so you should."

We need more tangible and quantifiable information about this database. Until then it's not even hype it's vaporware.

1 comments

Calling anything from Arthur "vaporware" is really laughable.

This article isn't great, but it's from a finance site, not a technical site: of course it's going to be low on technical details. You would think that people who want technical details would know how to use a search engine.

Further, kdb/kdb+ (which Shakti is the successor of) is a columnar-based database. "How many results can it return [...] from a table with how many rows?" is completely irrelevant.

There are hundreds of benchmarks for kdb+, and the speed of k is unrivaled.

>There are hundreds of benchmarks for kdb+, and the speed of k is unrivaled.

Eh, I worked on array languages in grad school and the speed of k was significantly over-hyped. It's trivial to beet with Python and no contest if you want to work in a compiled language. Wall Street develops weird religions around tech.

In my experience it is mostly trivial to beat with C if you adopt columnar storage (which isn’t idiomatic), but even numexpr doesn’t beat k (though it seems like it should), and numpy is still way behind; pandas and plain python not even in the same ballpark.

What approach did you use?

The one thing nothing beats k on is functionality per time spent (if you are proficient). It is often 100 times shorter but only takes 10 times as long to write.

I'm going to sidestep arguments about its terseness since some people are into that and it's really an aesthetic choice. But for machine learning or numerical algorithms, I found it quite hard to make k nearly as fast as vanilla NumPy code. Overall there's really no good reason for it to be faster. It didn't have (as of ~2011) a JIT or any mechanism for operator fusion, so you end up allocating many intermediates, sometimes using quadratic space which gets immediately reduced along some axis.

Yeah, I know "Arthur Whitney is really smart"...but that's not actually a technical reason why k would be faster. I don't remember the specifics but I think that many of the array operators weren't even multi-threaded and I remember only a few of the array operators getting FLOPs that would correspond to SIMD acceleration. So, for the most part, it seemed like a lot of vanilla single threaded C code behind the operators.

It's possible that the implementation got smarter over the past decade, but when I was working in the space it seemed like the hagiography and actual performance were worlds apart.

I didn't try modern machine learning stuff, so I cannot comment on that.

It doesn't do operator fusion or JIT, but neither does numpy ... numexpr, numba and pythran do, but you were referring to "plain python" in your original post.

Indeed, K is mostly vanilla single threaded code. When I started using K, i was also thinking "neat, simplifiied APL, but it won't be fast" (I had previous positive experience with APL). But it was way faster than plain threaded code should be. The reasons, AFAICT at the time were:

1. Access patterns are extremely predictable, which means much of the data is in L1 much of the time; That, on its own, delivers a factor of ~10 speed compared to having the same data scattered in L2 if you are memory bound.

2. The code is incredibly small and in 2003 when I looked at it, mostly fit inside the I-cache, which meant even the branch prediction available at the time was doing way better than one would expect.

3. The primitives are very well chosen and play well together.

The hidden elephant in the china shop is that your code has to be idiomatic for these to matter. See e.g. [0] from Stevan Apter, for aggregating functions in a toy database e.g. group[avg] - the version he uses, vs. 4x slower one in the comment.

I have no doubt it is possibly to beat k with numpy; it might even be easy if you are more proficient with the latter than with the former. However, for the kind of exploratory data work I did in the day, it is very hard to beat overall when you take thinking, debugging and experimenting into account. And if/when you have converged onto a well-enough defined computational pipeline, the best tool for that specific job (C++, CUDA, TF, MKL) is often the only reasonable choice.

> when I was working in the space it seemed like the hagiography and actual performance were worlds apart.

I don't know who you were talking to, but for me it was always an optimization on the "effort/result" metric, not on either on its own. Python often requires less effort, but iteration runtime speed (even with numpy) is horrible. C+CUDA delivers faster results but the iteration development speed is horrible. For me K struck a sweet spot. YMMV

[0] http://www.nsl.com/k/t.k

But for machine learning or numerical algorithms, I found it quite hard to make k nearly as fast as vanilla NumPy code

Observe that KX’s approach to ML is... to just wrap Python https://code.kx.com/v2/ml/

At the least, 'range' (!3 -> 0 1 2) is now deferred (including with offsets eg 1+!3).
Here are some actual benchmarks comparing Python (with pandas/numpy) vs k (aka q/kdb+). It seems k is orders of magnitude more performant - without applying any no brainer optimizations/indexing:

https://www.linkedin.com/pulse/data-analysis-example-python-...

https://www.linkedin.com/pulse/lists-python-q-side-by-side-f...

You must have been writing incredibly terrible k, or using a bootleg interpreter.
Shakti themselves used rows as the comparison metric in the original link: https://shakti.com/database-software-history/

Again, saying the speed of k is unrivaled is hype speak. You're adding negative value to the discussion.

To give you an idea, here are some benchmarks comparing Python (with pandas/numpy) vs k (aka q/kdb+).

https://www.linkedin.com/pulse/data-analysis-example-python-...

https://www.linkedin.com/pulse/lists-python-q-side-by-side-f...

I think that software history page is meant to convey general workloads achievable, rather than performance.

-(not the author of the benchmarks nor an employee of shakti FWIW)

> Calling anything from Arthur "vaporware" is really laughable.

kOS?

It was a research project, and it actually booted on bare metal and ran what it was supposed to (milspec bare bones os with gui, filesysten, db, etc in a few hundred KB)
Do you have a link? This sounds fascinating.
kOS exists, at least one HN user has used it ('geocar). As a research operating system, seeing it released into the public was never a likely goal, and as far as I know no one claimed it would be..
It really depends on what specific benchmarks you're talking about, but I would be really interested in seeing benchmarks in which k beats C with AVX