|
|
|
|
|
by faltet
4863 days ago
|
|
That's right. But even in this case, Blosc, the internal compressor used in Blaze, can detect whether the data is compressible or not pretty early in the compression pipeline, and decide to stop compressing and start just copying (how early that decision is taked depends on the compression level). The good news is that Blosc can still use threads in parallel for doing the copy, and this normally gives significantly better speed than a non-threaded memcpy() (the default for many modern systems). |
|
Really? I always thought that a single core could always saturate available memory bandwidth (unless you have some weird architecture like NUMA). If you're seeing a multithread memcpy that has better performance speed, maybe you're just stealing memory bandwidth from other processes (since AFAIK memory bandwidth is probably done on a per-thread basis), or maybe you're getting more CPU cache allocated to you because you're running on multiple cores?
This would be interesting to investigate.