Hacker News new | ask | show | jobs
by jonathanapp 3115 days ago
Notes from my KX/kdb experience:

1.) The in-memory DB .exe was around 500 KB. Imagine that.

2.) The Q language syntax, while consistent, is fairly arcane and throwback to decades past.

3.) The documentation and driver support is abysmal.

4.) It's supposedly extremely fast, but I can't help but wonder if this is a lot of successful PR and hype (like hedge fund bosses insisting on Oracle because it's the only db that 'scales')

6 comments

I used to use KX/kdb/Q/K daily for several years. I wrote a full implementation of reinforcement learning (15 lines), a lightweight MVC framework (to show reports and tables in an internal webapp) and even a Q syntax checker (abusing table as a data structure to hold parse trees). Good or bad, for the longest time, Q was my "go-to" programming language.

Based on that experience...

1) Yes, but that's not huge by modern standard.

2) Q is a DSL version of K. As others have commented, K is a pretty clean implementation of APL, and Q makes K more approachable.

3) I have to agree here, but Q for Mortals makes up for it.

4) It is really fast. As we all know, a vast majority of us actually don't have terabytes and terabytes of data, especially after a reasonably cleanup / ETL / applying common sense. I suppose it helped that I worked in finance, which meant my desktop had 16GB of memory in 2009 and 128GB of memory on a server shared by 4-5 traders.

Finally, Q was never intended for general-purpose computing nor a widespread adoption. At least when I was an active user, the mailing list had the same 20-30 people asking questions and 3-4 people answering them, including a@kx.com (= Arthur Whitney, the creator). Back then, I'd say there were at most 2-3k active users of Q/K in the world. Now that Kx Systems is part of First Derivative and has been working on expanding their customer base, perhaps they have more...?

It is worth pointing out that really fast is ... well ... really fast. See [1] for some benchmarks they did for small, medium, large data sets.

The machines that $dayjob-1 used to build dominated the STAC-M3 for a few years (2013-2015) because we paid careful attention to how kdb liked to work, and how users liked to structure their shards. Our IO engine was built to handle that exceptionally well, so, not only did in-memory operations roar, the out of memory streaming from disk ops positively screamed on our units (and whimpered on others).

I miss those days to some degree. Was kind of fun to have a set of insanely fast boxen to work with.

[1] http://kparc.com/q4/readme.txt

1) Yes, but that's not huge by modern standard.

OP could have phrased it better, but I presume his point was that 500KB is extremely small by modern standards. The whole executable fits comfortably in L3, so you'll probably never have a full cache miss for instructions. On the other hand, while it's cool that it's small, I'm not sure that binary size is a good proxy for performance. Instruction cache misses are rarely going to be a limiting factor.

> Instruction cache misses are rarely going to be a limiting factor.

k's performance is a combination of a lot of small things, each one independently doesn't seem to be that meaningful. And yet, the combination screams.

The main interpreter core, for example, used to be <16K code and fit entirely within the I-cache; that means bytecode dispatch was essentially never re-fetched or re-decoded to micro instructions, and all the speculative execution predictors have a super high hit rate.

When Python switched the interpreter loop from a switch to a threaded one, for example, they got ~20% speedup[0]; I wouldn't be surprised if the fitting entirely within the I-cache (which K did and Python didn't at the time) gives another 20% speedup.

[0] https://bugs.python.org/issue4753

And yet, the combination screams.

Yes, I presume it's very fast because of a number of smart design decisions. I would guess that the relatively small on-disk size of executable is a consequence of these decisions, rather than a cause of the high speed. And as you point it, it's really the design of the core interpreter that matters.

When Python switched the interpreter loop from a switch to a threaded one, for example, they got ~20% speedup[0]; I wouldn't be surprised if the fitting entirely within the I-cache (which K did and Python didn't at the time) gives another 20% speedup.

I'm familiar with this improvement, and talk it up often. Since certain opcodes are more likely to follow other opcodes (even if they are globally rare) threaded dispatch can significantly reduce branch prediction errors. But despite not having measured the number of I-cache misses on the Python benchmarks, I'd be utterly astonished if there were enough of them to allow for a 20% speedup. My guess would be that the potential is something around 1%, but if you can prove that it's more than 10% I'd be excited to help you work on solving it.

I am not involved with k, and things might have changed significantly, but around the 2003-2005 timeframe, Arthur had very conclusive benchmarks that showed I-cache residence makes a huge difference (IIRC I-cache was just 8KB those days ...).

The people who surely know what difference it makes today are Nial Dalton and Arthur Whitney.

around the 2003-2005 timeframe, Arthur had very conclusive benchmarks that showed I-cache residence makes a huge difference

That sounds quite plausible. The front-end of Intel processors (the parts that deal with making sure there is a queue of instructions ready to execute by the backend) has made some major advances since then. The biggest jumps were probably Nehalem in 2007, and then Sandy Bridge in 2009.

It's not that binary size no longer matters, but you almost have to go out of your way to make instruction cache misses be the tightest bottleneck on a hot path. And when it would be the bottleneck, the branch predictor and prefetch are so good that it's usually only a problem when combined with poor branch prediction, so it really only adds to the delay rather than causing it.

In order for the Q interpreter to fit in that small size, the language has some rather severe limits. For example, function parameters, local variables and conditional branch sizes. Forcing users to structure code around these limits feels a bit archaic to me. This is what compilers are for.
Would be really interesting to read a write up on your experience. What do you program in now? How do you look at other PLs now? What do you miss and what are you happy "just works"? What do you think other PLs (especially languages like Lisp, which are very high in terseness) can learn from Q?
I would compare Q (and other APL-related languages) to Vim editor. There you have some carefully chosen operations which are easy to perform. They don't take much efforts. They are also easy to compose in useful ways - because the corresponding properties support that. Since the basis of editing operations is fairly large, you have many operations; but when you know many of them, you can work powerful edits.

Lisp on the other hand is more like Emacs - naturally. Here we have a small, carefully chosen orthogonal basis of abstract operations - not domain-specific, but "theoretically-foundational" small basis. Then you have a library of macros on top of that, and ability, of course, to extend.

In other words, basis for APL is "classical" math, made executable and expanded with mechanisms required to put in one line programming constructs (logic, control flow, ordering...). It's harder to expand, but you don't often need that. Lisp is a specific branch of math, lambda calculus, which is provably enough to solve arbitrary programming problem. The "inner core" of Lisp is also hard to expand, but what you expand for your task is "the usage" of the language, which is made to be straightforwardly expandable.

To me it's hard to say what is better.

Have you open source the reinforcement learning and MVC framework code? Can't wait to study it.
> 1) The in-memory DB .exe was around 500 KB. Imagine that

It still is, but the hot path is much smaller than that.

> 2) The Q language syntax, while consistent, is fairly arcane and throwback to decades past.

I can't comment about this. I don't mind the syntax. I prefer k syntax though.

> 3) The documentation and driver support is abysmal.

This is getting a lot better. The fusion API[1] goes a long way towards better "drivers", and the new documentation site[2] shows a lot of energy being put into organization. There's also Q for mortals[3] which is linearized for people who like that.

[1]: http://code.kx.com/q/interfaces/fusion/

[2]: http://code.kx.com/q/

[3]: http://code.kx.com/q4m3/preface/

> 4) It's supposedly extremely fast, but I can't help but wonder if this is a lot of successful PR and hype

There are a lot of good benchmarks of kdb[4] unlike Oracle :)

[4]: https://stacresearch.com/m3

2) I like K better (fitting the ideals of APL); I feel Q was done by Arthur to please some big client as it doesn't feel like he would choose that kind of thing (that's from reading interviews, seeing the iterations and his basic code philosophy)
I like K better than Q too, but J[1] clicks with me more.

J has JDB[2] and Jd[3] for things somewhat similar to qdb with Jd being the commercial offering similar to qdb rather than JDB.

I would probably choose APL over Q if that were a choice. In J you can always make your definitions (verbs, nouns, etc...) plain words if you like the way Q reads.

  [1] jsoftware.com
  [2] http://code.jsoftware.com/wiki/JDB
  [3] http://www.jsoftware.com/jdhelp/overview.html
Have you used Dyalog APL recently? They now have a rank operator and fork/hook from J. I am learning J but haven't made up my mind about this.
Dyalog appears to be more popular with conferences and more products, but it costs $1k ish for a commercial license and nobody else can run your code without a license and server licenses aren't cheap. It also pretty much needs a special keyboard and a key mapping. J is free for pretty much everything and uses standard characters (although I really like the APL characters). I think they're both nice.
Dyalog comes with a keyboard layout (on a Mac it just replaces the alt keys). It's quite easy to use. GNU APL's Emacs mode does the same thing, although mapped to super rather than alt (meta, in Emacs) by default.

I'm aware of the licensing costs, I'm more curious about whether Dyalog is obtaining popularity versus J, and if so, why. Of course, three new people going to the Dyalog conference would be a 10% increase in popularity, it looks like… so maybe this far out on the long tail it doesn't matter.

Yea, I was just saying the key mappings can be a pain and your favorite keyboard probably doesn't have the APL symbols on it. The Dyalog IDE has a virtual keyboard, but I don't like those too much. If none of that bothers you, than no biggie. I'm guessing Dyalog has more production users and a bit more users than you see at the conference as they are typically held in the UK. J is free, so I bet a lot more people try it even though Dyalog has a free hobby license. J has a nice built in plotting library"viewmat" while Dyalog has sharpleaf. Both are nice, but sharpleaf has a GUI like doing charts in Excel. Dyalog can easily hook-in to .NET, so that is pretty helpful on Windows in the real-world. I'd agree it's a wash right now. What is your background and needs?
I'm more comfortable in J but there are design choices in K that really appeal to me.

My ideal language would be K's function syntax and k-tree data structure/namespaces, but with J's primitives and standard library.

Any other downsides? Management has been convinced and we're apparently switching to it at work soon but there is so little information and most of that is marketing, so it's hard to get an idea of what we're walking into.
price
Hedge funds would be the ones to use kdb over Oracle, wouldn't they?