|
|
|
|
|
by ein0p
519 days ago
|
|
ZFS on Linux unfortunately has a long standing bug which makes it unusable under load: https://github.com/openzfs/zfs/issues/9130. 5.5 years old, nobody knows the root cause. Symptoms: under load (such as what one or two large concurrent rsyncs may generate over a fast network - that's how I encountered it) the pool begins to crap out and shows integrity errors and in some cases loses data (for some users - it never lost data for me). So if you do any high rate copies you _must_ hash-compare source and destination. This needs to be done after all the writes are completed to the zpool, because concurrent high rate reads seem to exacerbate the issue. Once the data is at rest, things seem to be fine. Low levels of load are also fine. |
|
https://github.com/openzfs/zfs/issues/9130#issuecomment-2614...
That said, there are many others who stress ZFS on a regular basis and ZFS handles the stress fine. I do not doubt that there are bugs in the code, but I feel like there are other things at play in that report. Messages saying that the txg_sync thread has hung for 120 seconds typically indicate that disk IO is running slowly due to reasons external to ZFS (and sometimes, reasons internal to ZFS, such as data deduplication).
I will try to help everyone in that issue. Thanks for bringing that to my attention. I have been less active over the past few years, so I was not aware of that mega issue.