Very interesting. It might be the case that zstd optimizes more for recent machines; zstd famously uses four different compression streams to maximize instruction-level parallelism and that might not work well in older machines. I haven't seen any machine where zstd is significantly slower than it should, but those machines I could test came from 2013 or later. Or either the RHEL package might have been optimized for recent machines. It would be interesting to test a binary optimized for the current machine (-march=native -mtune=native).