The biggest recurring issue was deadlocks in the garbage collector. It would start cleanup in a subvolume and trip all over itself. After that, any I/O to that specific directory would never return. The only solutions was to reboot the server and fsck for a few hours.
Second frequent problem: Hitting 90% capacity in a filesystem has a non-trivial chance to ruin it forever. Hit the wrong code path, and, even if you immediately delete a bunch of things, I/O to that filesystem would be forever 3000% slower.