|
|
|
|
|
by markhpc
886 days ago
|
|
I wanted to chime in and mention that we've never seen any issues with IOMMU before in Ceph. We have a previous generation of the same 1U chassis from Dell with AMD Rome processors in the upstream ceph lab and they don't suffer from the same issue despite performing similarly at the same scale (~30 OSDs). The customer did say they've seen this in the past in their data center. I'm hoping we can work with AMD to figure out what's going on. I did some work last summer kind of duct taping the OSD's existing threading model (double buffering the hand-off between async msgr and worker threads, adaptive thread wakeup, etc). I could achieve significant performance / efficiency gains under load, but at the expense of increased low-load latency (Ceph by default is very aggressive about waking up threads when new IO arrives for a given shard). One of the other core developers and I discussed it and we both came to the conclusion that it probably makes sense to do a more thorough rewrite of the threading code. |
|