| HN Mirror

I didn't try modern machine learning stuff, so I cannot comment on that.

It doesn't do operator fusion or JIT, but neither does numpy ... numexpr, numba and pythran do, but you were referring to "plain python" in your original post.

Indeed, K is mostly vanilla single threaded code. When I started using K, i was also thinking "neat, simplifiied APL, but it won't be fast" (I had previous positive experience with APL). But it was way faster than plain threaded code should be. The reasons, AFAICT at the time were:

1. Access patterns are extremely predictable, which means much of the data is in L1 much of the time; That, on its own, delivers a factor of ~10 speed compared to having the same data scattered in L2 if you are memory bound.

2. The code is incredibly small and in 2003 when I looked at it, mostly fit inside the I-cache, which meant even the branch prediction available at the time was doing way better than one would expect.

3. The primitives are very well chosen and play well together.

The hidden elephant in the china shop is that your code has to be idiomatic for these to matter. See e.g. [0] from Stevan Apter, for aggregating functions in a toy database e.g. group[avg] - the version he uses, vs. 4x slower one in the comment.

I have no doubt it is possibly to beat k with numpy; it might even be easy if you are more proficient with the latter than with the former. However, for the kind of exploratory data work I did in the day, it is very hard to beat overall when you take thinking, debugging and experimenting into account. And if/when you have converged onto a well-enough defined computational pipeline, the best tool for that specific job (C++, CUDA, TF, MKL) is often the only reasonable choice.

> when I was working in the space it seemed like the hagiography and actual performance were worlds apart.

I don't know who you were talking to, but for me it was always an optimization on the "effort/result" metric, not on either on its own. Python often requires less effort, but iteration runtime speed (even with numpy) is horrible. C+CUDA delivers faster results but the iteration development speed is horrible. For me K struck a sweet spot. YMMV

[0] http://www.nsl.com/k/t.k