Hacker News new | ask | show | jobs
by peterabbitcook 40 days ago
I’ve been skimming the source code and it looks promising for the stated use case. Wondering how to configure and set it up for a producer/consumer scenario where the producer puts compressed bytes on the wire and the consumer processes it; I can definitely see a use case where an edge sensor pumps compressed data to a cloud server with a GPU, though I don’t usually pipe doubles to a GPU.

Something worth thinking about that since you mentioned it’s geared towards “scientific” data streams. If we’re talking about precise measurements from instruments, your sensor is typically an analog signal which you digitize. Digitizers exist that can output floats, but DACs used in industry like a Rincon or Alazar (that sample at multiples of 100 MHz) prefer to output quantized shorts or ints that are rescaled to a float with a magic number (i.e. 32767/pi for a phase measurement, or gain/(16 mA) for industrial transducers) somewhere down the line. I bring this up because you pointed out your max throughput is about 120 MiB/s which would make it a big bottleneck for scientific data coming out of a digitizer that can pump out 800-1600MiB/s. 120 MiB/s throughput of doubles is not really that high for CPU level computations or network Tx bandwidth on modern hardware.

1 comments

Fair points on both. Thanks for aasking.

The 120 MiB/s encode ceiling is the cost of the mode competition. that's where the ratio comes from. At 800-1600 MiB/s off a digitizer, fc is the bottleneck no matter what transport sits behind it; for that regime zstd-3 or lz4 are the better fit, or fc further down the pipeline on aggregated/decimated data.

You're also right on int/short. fc's modes look for IEEE-754 bit patterns, so doubles that started life as rescaled ints lose the structure those modes exploit. A native int16/int32 path is on the list.

For the wiring itself: I have a sister single-header library, vibe (https://github.com/xtellect/vibe), built for this exact pattern: length-prefixed TCP/IPC framing on Linux, with a `telemetry_sink` example close to the edge-sensor --> cloud-ingest case. Producer compresses with fc, ships framed bytes through vibe, consumer decompresses. Doesn't solve the throughput ceiling, but handles the producer/consumer setup cleanly.

edit: i think the comments is flagged automatically because I used `vibe` (bad name I know) :)