Hacker News new | ask | show | jobs
by noahdesu 4786 days ago
Platform support (http://ceph.com/docs/master/install/os-recommendations/#plat...) and deployment tools have come a long way. In the past setup has been complicated (Ceph is inherently more complex than other systems), but it is getting much easier. There is also extensive documentation at http//ceph.com/docs, as well as very active IRC channel and mailing lists for support.

Ceph is much larger than just the file system (as the article points out). And while many Ceph products/subsystems are used extensively in production environments (RBD block store, RGW, and RADOS), CephFS isn't officially supported as production-ready.

Despite that, we run Hadoop on top of CephFS, and can deal with the occasional metadata server problem. CephFS is actively being hardened.

1 comments

Correct, Ceph is much larger than the file system. As I said at beginning of article I'm using each Ceph component, RBD, RadosGW and CephFS and I will write article for each of that. This is just some sort of getting started guide. I'm interested at running Hadoop on top of CephFS, is it stable enough?
The instability we have seen is with the metadata server, but have been able to relatively quickly push fixes to upstream as we encounter them. The focus has been on stability, and we have been running pretty large terasort jobs without issue. In the upcoming release of Cuttlefish, due out any time now, there will be locality information exposed to Hadoop for better task scheduling. We will start focusing on improving Hadoop performance soon, now that things have stabilized.