| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lessbergstein 898 days ago
	I don't understand, it should be pretty easy. A rolling average with BigDecimal would probably be sufficient but a scientific lib might be better for a rolling average or more than a hundred million numbers. https://stackoverflow.com/questions/277309/java-floating-poi...

2 comments

ako 898 days ago

The difficulty is creating the fastest implementation. If you look at the results of the submissions so far you’ll see a big difference in duration, between 11 seconds and more than 4 minutes.

11 seconds seems pretty impressive for a 12Gb file. Would be interesting to know what programming language could do it faster. For a database comparison you’d probably want to include loading the data into your database for a fair comparrison.

link

lessbergstein 898 days ago

Perl would do it quite fast and it has the benefit of accessing posix primitives directly.

link

gerikson 898 days ago

A naive perl solution is really really slow compared to even the reference Java implementation. (I know, I've tried)

link

lessbergstein 898 days ago

That's strange, you should be able to stream the file right into a tiny perl executable at the same speed as the bottlenecking hardware. The kernel will take care of all the logistics. You're probably trying to do too much explicitly. Just use a pipe. Perl should be done before Jit completes.

link

gerikson 898 days ago

Using cat to redirect the file to /dev/null takes 18s on my machine (a low-end NUC). Just running a noop on the file in Perl (ie. feeding it into a `while (<>)` loop but not acting on the contents) takes ~2 minutes.

1B lines is a lot, and Java ain't a slouch.

link

lessbergstein 898 days ago

Why are you using cat at all? Use a pipe. This isn't hard stuff. Don't use <>, feed the file into a scalar or array. it should only take a few seconds to process a billion lines.

https://www.perl.com/pub/2003/11/21/slurp.html/#:~:text=Anot....

link

gerikson 898 days ago

I profiled my attempt, actually reading each line is the bottleneck.

link

lessbergstein 898 days ago

Perl is always going to be much faster than Java at tasks like this. Use stdin and chomp() instead of reading each line explicitly.

This is really a small, trivial task for a perl script. Even with a billion lines this is nothing for a modern cpu and perl.

link

petters 898 days ago

It’s easy to solve but even fizzbuzz becomes complicated if you want double digit GB/s output.

link

lessbergstein 898 days ago

It's really not. We're talking about gigahertz CPUs and likely solid state storage that can stream many gb/s.. running through a perl script. There really isn't much that is faster than that.

link