Hacker News new | ask | show | jobs
by mwcampbell 4255 days ago
So would k be even faster if it were compiled to machine code rather than being interpreted? Would that involve an unacceptable speed versus space trade-off? Or am I missing some crucial reason why k code has to be interpreted?
3 comments

According to wikipedia's uncited 'performance characteristics' section of their page on K (programming language):

> The small size of the interpreter and compact syntax of the language makes it possible for K applications to fit entirely within the level 1 cache of the processor.

Sounds like the overhead's acceptable?

.

Edit: the following page (2002) says

> Even though K is an interpreted language, the source code is somewhat compiled internally, but not into abstract machine code like Java or Python. The interpreter can be invoked interactively or non-interactively, and you can also compile a source file down to a binary file for product distribution, if you wish.

http://www.kuro5hin.org/story/2002/11/14/22741/791

Thanks for the link to kuro5hin btw - it's a nice intro to k but also I've not been on k5 for years, I had no idea it was still going :)
It would be even faster, with a sufficiently smart compiler (a feasible one, not they mythical beast[0]). E.g., the function

    sumsquares:{+/a*a*a*a:1+!x}
which sums pow(a,4) for a from 1 to x (its argument), when interpreted, allocates a vector of length x, initializes it to the numbers 0..(x-1), then allocates another vector of length(x), to which it copies the first vector while adding 1 to each element, and calls it a. Then makes another vector of length(x), puts a * a in respective elements, etc. - and finally, sums the elements of the remaining vector.

If we can learn from numpy (K equiv) -> numexpr (same work, smarter about allocations) -> numba (jit compiler, avoids allocations), we can expect x2 faster speed in both "upgrades" - about x4 overall.

If that sounds too little speedup for so much effort, you are not giving modern caches enough credit. Almost every single memory access here is in L1, as they are all sequential and perfectly predictable. So it's just 2-5 slower than doing the perfect register allocation with no memory access - but with the cost of ballooning code size (going out of L1 itself ...). Also, interpretation overhead is nearly nonexistent because operation on vectors are coded in tight cache aware C loops.

Overall in the interpreter/compiler tradeoff, K comes out far enough ahead of everything else that it did not make sense for them to invest in a sufficiently smart compiler. In fact, it is only a few months ago that they bothered using SSE/AVX vector instructions for significant speedup. It is so far ahead of the curve (for its use cases) that doing that didn't really give any marketing advantage until recently (and I'm not sure it does now).

[0] http://c2.com/cgi/wiki?SufficientlySmartCompiler

I don't think so. Many of K's primitives operate on arrays using stride 1 access. Being cache-friendly accounts for much of the performance gains.

In addition, K is written in ANSI C to maintain portability.