| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mkaufmann 4521 days ago

Great article, I am really eager to see the solutions in github. I guess there won't be many with c++ solutions.

I only found the challenge two days before the end. My goal with the challenge was not to get the best solution, but rather use it as an exercise to practice the different languages and finding intuitive solutions.

Level 0 as you wrote was really just a two line change.

Level 1 was fun, because I tried to work with the existing bash structure as far as possible and just replaced the hot loop with inline python (something new learned). I found it interesting that this way I could directly inject the variables into the python program without having to load them as env variables (see [1]).

In Level 2 I did too much work, I did not think simple enough. I assumed that the different test cases would differ a lot and that a static divider would not help. My assumption was that the histogram of requests per second across the different IPs would have a bimodal distribution[2], thus I used the excellent "fast-stats" library to get a histogram and ban clients based on that. The library even offers approximate histograms, so even if there would be thousands of requests it would still scale.

To solve Level 3 with the given code required to changes:

-(i) Partitioning the data across the three server. I only loaded a subset of the files based on the server id (which was conveniently already available in the search servers) and

-(ii) Loading the files in memory so that they don't have to be read from disk for every search term. This was enough to pass this level. The code was too long for the gist but I can put it on github if someone wants to take a look at it. No fancy index structure required. I just scanned The files linearly on each request.

Mastering Level 4 was not possible for me, I tried working with the go-raft library and got the integration with unix sockets working, but somehow the system got unstable after a while. Fixing this was really frustrating and in the end I gave up because I could not find the problem and time was running out.

I congratulate all ~200 persons which also passed this last level. This was much more difficult than the other ones!

[1] https://gist.github.com/mkaufmann/8716922 [2] http://en.wikipedia.org/wiki/Bimodal_distribution

3 comments

recentdarkness 4521 days ago

well to level3 I can tell you, I just used grep embedded in a nodejs server application :D

Everything boiled down to this: "cd " + data_path + "; grep -rno " + key + " | cut -d: -f1,2 | sort | uniq"

I read the result from stdout and transformed the result into json that was enough to make it. I did not want at that time to spend more time on that level ;)

link

mkaufmann 4521 days ago

Damn ok, so I also missed that opportunity. Great find!

link

recentdarkness 4521 days ago

I forgot to add: I cached every result for every key which was searched for, because there was quite often that it asked 3-5 times for the same term in a row. So I had an instant result in a simple javascript object :D

link

fragmede 4521 days ago

Mind posting your code for level3? I got that partitioning was the main trick but I am too unfamiliar with scala to reassemble the responses properly.

link

ajtulloch 4521 days ago

FWIW, It was possible to pass Level 3 with ~5 lines of code changed - https://gist.github.com/ajtulloch/e05f75aaa6ba3b5b0241#file-... for my solution.

link

brown9-2 4521 days ago

Here is how I merged the responses: https://gist.github.com/mattnworb/8717911

link

necubi 4521 days ago

Or a bit cleaner using lift-json: https://gist.github.com/mwylde/8724700.

link

dkhenry 4521 days ago

I did both all my solutions in C++. I am going to clean them up a little and post them in my github account

github.com/dkhenry

link