| HN Mirror

That's a great point. I do have a team that understands the underlying technologies and has been successful in troubleshooting several production problems with Rook/Ceph, one recent one including file system corruption. My original post is just trying to state that our engineering team does not maintain a deep operational knowledge of the best way to configure, manage, monitor, scale, etc (operate) ceph in production. We rely on the Rook operator for this.

Troubleshooting acute outages caused by hardware or software failures requires a different skill than properly configuring the system to scale and minimize the chances of a corruption or outages. Rook solves the later, but we do understand the architecture and what Rook (and Ceph) are doing. We've just removed the expert level, craftsman, speciality knowledge required to operator Ceph because we decided, after a thorough evaluation, that the software in this case is the most capable solution.