|
|
|
|
|
by ashtonsix
247 days ago
|
|
I just did a mini-ablation study for this (prefix sum). By getting rid of the cross-block carry (16 values), you can increase perf from 19.85 to 23.45 GB/s: the gain is modest as most performance is lost on accumulator carry within the block. An absolute value every 16 deltas would undermine compression: a greater interval would lose even the modest performance gain, while a smaller interval would completely lose the compressibility benefits of delta coding. It's a different matter, although there is definitely plausible motivation for absolute values every X deltas: query/update locality (mini-partition-level). You wouldn't want to transcode a huge number of values to access/modify a small subset. |
|
Even if one is willing to adopt such an encoding scheme, you'd still want to optimize what you have here anyways though. It also doesn't help, as mentioned, if the goal is actually latency of small streams rather than throughput of large ones.