Hacker News new | ask | show | jobs
by marginalia_nu 1550 days ago
Outputting and manipulating side-effects can be useful even in imperative code.

I had real performance problems a while back attempting to build a very large file, my code could only produce the write instructions out of order and the file was too big to hold in memory, so what I ended up doing was writing writing instructions in radix-grouped batches to a bunch of temporary files, and then reading and evaluating them to build the large file.

This seems counter-intuitive, as it more than doubles both the amount of data written to disk as well as adds a reading-step, but doing it this way means the data is written in a way the hardware can deal with a lot more efficiently. Sequential access to and from the instruction files (off a mechanical drive), and densely clustered writes to the big output file. (on an SSD, strictly sequential writes matters less than being in the same block)

This reduced the runtime from several hours to like 5 minutes.

1 comments

I think it's always fascinating to find situations where counterintuitively it is faster to do more work.

For example, it took me a while to realize that most of the it's actually faster to read/write compressed data overall - you'd think that reading from a disk and decompressing the data would be slower than just reading uncompressed data from a disk directly, but due to the vast difference in disk IO performance and CPU decompression performance it's almost always faster to perform disk IO compressed. I'm writing almost always since I'm not sure how the tradeoff looks for current high performance PCIe SSDs (or other storage devices with very fast IO).