An alternative to user5994461's suggestion would be to switch from CephFS to a different distributed filesystem. If you need POSIX and can't go directly to S3 you could try ObjectiveFS[1]. Using local SSD instance store and memory for caching you can get very good performance even for small file workloads[2].
Latency for small random reads (16 parallel threads), using the Linux kernel source tree, when hitting in the SSD instance store disk cache have a 95th-percentile latency of 6ms with ~50MB/s throughput. When hitting in the memory cache the 95th-percentile latency is <1ms with ~380MB/s throughput. We have more performance data available at https://objectivefs.com/howto/performance-amazon-efs-vs-obje...