I don't understand, it should be pretty easy. A rolling average with BigDecimal would probably be sufficient but a scientific lib might be better for a rolling average or more than a hundred million numbers.
The difficulty is creating the fastest implementation. If you look at the results of the submissions so far you’ll see a big difference in duration, between 11 seconds and more than 4 minutes.
11 seconds seems pretty impressive for a 12Gb file. Would be interesting to know what programming language could do it faster. For a database comparison you’d probably want to include loading the data into your database for a fair comparrison.
That's strange, you should be able to stream the file right into a tiny perl executable at the same speed as the bottlenecking hardware. The kernel will take care of all the logistics. You're probably trying to do too much explicitly. Just use a pipe. Perl should be done before Jit completes.
Using cat to redirect the file to /dev/null takes 18s on my machine (a low-end NUC). Just running a noop on the file in Perl (ie. feeding it into a `while (<>)` loop but not acting on the contents) takes ~2 minutes.
Why are you using cat at all? Use a pipe. This isn't hard stuff. Don't use <>, feed the file into a scalar or array. it should only take a few seconds to process a billion lines.
It's really not. We're talking about gigahertz CPUs and likely solid state storage that can stream many gb/s.. running through a perl script. There really isn't much that is faster than that.
11 seconds seems pretty impressive for a 12Gb file. Would be interesting to know what programming language could do it faster. For a database comparison you’d probably want to include loading the data into your database for a fair comparrison.