Hacker News new | ask | show | jobs
by iskander 2389 days ago
I'm going to sidestep arguments about its terseness since some people are into that and it's really an aesthetic choice. But for machine learning or numerical algorithms, I found it quite hard to make k nearly as fast as vanilla NumPy code. Overall there's really no good reason for it to be faster. It didn't have (as of ~2011) a JIT or any mechanism for operator fusion, so you end up allocating many intermediates, sometimes using quadratic space which gets immediately reduced along some axis.

Yeah, I know "Arthur Whitney is really smart"...but that's not actually a technical reason why k would be faster. I don't remember the specifics but I think that many of the array operators weren't even multi-threaded and I remember only a few of the array operators getting FLOPs that would correspond to SIMD acceleration. So, for the most part, it seemed like a lot of vanilla single threaded C code behind the operators.

It's possible that the implementation got smarter over the past decade, but when I was working in the space it seemed like the hagiography and actual performance were worlds apart.

3 comments

I didn't try modern machine learning stuff, so I cannot comment on that.

It doesn't do operator fusion or JIT, but neither does numpy ... numexpr, numba and pythran do, but you were referring to "plain python" in your original post.

Indeed, K is mostly vanilla single threaded code. When I started using K, i was also thinking "neat, simplifiied APL, but it won't be fast" (I had previous positive experience with APL). But it was way faster than plain threaded code should be. The reasons, AFAICT at the time were:

1. Access patterns are extremely predictable, which means much of the data is in L1 much of the time; That, on its own, delivers a factor of ~10 speed compared to having the same data scattered in L2 if you are memory bound.

2. The code is incredibly small and in 2003 when I looked at it, mostly fit inside the I-cache, which meant even the branch prediction available at the time was doing way better than one would expect.

3. The primitives are very well chosen and play well together.

The hidden elephant in the china shop is that your code has to be idiomatic for these to matter. See e.g. [0] from Stevan Apter, for aggregating functions in a toy database e.g. group[avg] - the version he uses, vs. 4x slower one in the comment.

I have no doubt it is possibly to beat k with numpy; it might even be easy if you are more proficient with the latter than with the former. However, for the kind of exploratory data work I did in the day, it is very hard to beat overall when you take thinking, debugging and experimenting into account. And if/when you have converged onto a well-enough defined computational pipeline, the best tool for that specific job (C++, CUDA, TF, MKL) is often the only reasonable choice.

> when I was working in the space it seemed like the hagiography and actual performance were worlds apart.

I don't know who you were talking to, but for me it was always an optimization on the "effort/result" metric, not on either on its own. Python often requires less effort, but iteration runtime speed (even with numpy) is horrible. C+CUDA delivers faster results but the iteration development speed is horrible. For me K struck a sweet spot. YMMV

[0] http://www.nsl.com/k/t.k

But for machine learning or numerical algorithms, I found it quite hard to make k nearly as fast as vanilla NumPy code

Observe that KX’s approach to ML is... to just wrap Python https://code.kx.com/v2/ml/

At the least, 'range' (!3 -> 0 1 2) is now deferred (including with offsets eg 1+!3).