Hacker News new | ask | show | jobs
by derefr 2524 days ago
I'm reminded to add, in the vein of the author's complaint, that there is a similar ridiculousness in Erlang land, that cannot be circumnavigated so easily: reading/writing to zlib-compressed files using Erlang's file:open(..., [compressed]) option—or generating/parsing zlib-compressed ETF binaries using erlang:binary_to_term(..., [compressed]))—holds [the moral equivalent of†] a global lock. Only one process can be zlib-compressing or zlib-decompressing a chunk of data at once, no matter how many cores your system has.

This means that, even when your data set compresses so well that you'd theoretically gain a ton of speed by having the data streamed from disk compressed, and then decompressed during parsing—this doesn't apply in practice, since you're introducing an artificial bottleneck in your IO reads.

I'm not actually sure if this is a bug in Erlang, per se, or if it's just the intended behavior and compressed file IO was never intended to be used for performance, only for e.g. embedded devices with tiny ROMs.

(If people here think it's a bug, I'll probably go to the effort at some point to profile the performance impact and submit it as a bug on https://bugs.erlang.org.)

† What it's actually doing, is that all zlib compression/decompression passes get sent to a zlib "port driver" as messages. Port drivers can handle multiple requests in-flight at once (they're the in-process equivalent to sockets—each Erlang process's port against the port-driver is its own "connection") but the zlib port driver shim is coded to expose zlib as a single-threaded, blocking, request-response style of server, rather than one that accepts connections in parallel and instantiates a separate zlib context for each separate connection it receives.

1 comments

That's probably not true in the recent versions - in OTP 20 the zlib integration was reworked and is now based on NIFs (similar to Java's JNI) instead of port drivers.