| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by geocar 4247 days ago

kOS is K5; it's as different from K4 as K4 was to K3.

Views are available in K4/Q/KDB+, so the version on kx.com does work with views. If you want to play with the views idea, try to solve a problem and see where a view helps you.

Here's one I did earlier this week; I've got this tool that takes a whole bunch of logfiles of userids and counts; they look like this:

    5033703751425413371 1
    5409789758109122623 4
    10102846067816284236 4

I call these columns `s(sessionid) and `c(count); I can load them with this:

    G:{g::select from(+`s`c!("*I";" ")0:"S"$":",x) where (c>0)}

Now I say something like: G"trk/tnl5/20141207/TRK_ITM" to get one of these structures into the g variable.

I actually have a bunch of filenames:

    t:{R:{(x,"/"),/:$:!"S"$":",x};,/R',/R'R"trk"}[]

So I can load the first one with: G t 0

This is convenient, since these files are big (1Gb each or so) I'll want to work with a subset:

    g:10000#g

This isn't part of my program, it's just something I do while developing. If I run "G" again it'll get the entire file again.

Now, I want to match these against another list of users. I'll do a similar trick:

    D:{d::0:"S"$":",x}

and now I can do: D"users.txt" to put this into the d variable.

My program needs to find the rows of g that d match.

Some of the sessionids in the logfiles are corrupt. They look like this instead:

    7241110807448127691320303160517889791474867567241110807448127691320312936057606400732600664532969577042775 4
    62349552899220301013203170219134834182361546234955289922030101320303170219134834182361546234955289922030101 4

These are wrong. I originally tried to figure out them by finding the longest string that was also at the beginning of the sessionid, but after talking it out with Oleg and Pierre I decided to try to simply match them against the lengths of d:

    ?:#:'d

said I only have two lengths (19 and 20), so this is a much smaller search than what I was trying before!

To do this, I used views:

    K::?:#:'d
    k::"S"$"S",/:$:'K
    f::k!+(K#\:/:g.s)

Now f is an index on g; instead of a `s column it has an `Sn column where n is length of the key; i.e. I have a `S19 and an `S20 column with the first 19 characters of `s and the first 20 characters of `s accordingly.

    s::g.s[&|/f[k]in\:d]
    c::g.c[&|/f[k]in\:d]

Now I can look at the `s and `c vars and do what I want to do; get my unique userids and my counts.

More importantly, I can apply this to all my files (I have a lot of them):

    n:0;m:0;{G x;n+:+/c;m+:+/g.c}'t