|
|
|
|
|
by derefr
2524 days ago
|
|
I don't know what Java's BufferedReader is doing, but it's probably not the optimal thing in terms of IO throughput. I would blame the algorithm long before blaming anything inherent about the JVM. Erlang is another language where "naive" IO is kind of slow. https://github.com/bbense/beatwc/ is a project someone did to test various methods of doing IO in Erlang/Elixir, and their performance for a line-counting task, relative to the Unix wc(1) command. It's interesting to see which approaches are faster. Yes, parallelism gains you a bit, but a much larger win comes from avoiding the stutter-stop effect of cutting the read buffer off whenever you hit a newline. Instead, the read buffer should be the same size as your IO source's optimal read-chunk size (a disk block; a TCP huge packet), and you should grab a whole buffer-ful of lines at a time, do a pattern-matching binary scan to collect all the indices of the newlines, and then use those indices to part the buffer out as slice references. This achieves quite a dramatic speedup, since most of the time you don't need movable copies of the lines, and can copy the line (or more likely just part of it) yourself when you need to hold onto it. This approach is probably also already built in to Java's "better" IO libraries, like NIO. |
|
edit: Lemire is showing a lack of what Fowler describes as "mechanical sympathy".
https://martinfowler.com/articles/lmax.html#QueuesAndTheirLa...