Hacker News new | ask | show | jobs
by raggiskula 4422 days ago
Well, we used it for storage backend for VMs few years back.

For some reasons it usually failed on high load, resulting in bad split-brains, extremely slow IO and loss of data. Very bad. We contacted Red Hat regarding that and got comments that gluster was not VM ready yet. We ditched gluster and went for plane proven old iSCSI.

1 comments

This. We had the same failure mode, on a cluster with dozens of nodes and thousands of VMs.

http://thr3ads.net/gluster-users/2011/06/480298-Enomaly-user...

This bug (and others) would happen when moving files and/or folders over top of existing ones, ie:

  mkdir -p foo/bar/baz && \

  mkdir -p foo/bar/tmp && \

  touch foo/bar/baz/file foo/bar/tmp/file && \

  mv foo/bar/tmp/file foo/bar/baz/ && \

  rm -rf foo/bar/tmp
This was a method to replace an existing lockfile. Mercurial uses some similar code. The expected behavior would be that foo/bar/tmp/file replaced /foo/bar/baz/file. Instead, the outcome was that there was a race in Gluster where it got confused about which version of 'file' was correct, and it would end up with a split brain between the two nodes. This would be exacerbated by a node failure, but didn't always require one. Heavy load seemed to make the failure more likely. We couldn't replicate the same bug moving files in the same folder, it was the subfolders in the same gluster fs that seemed to cause the issue. The frustrating part was how gluster pointed fingers at the file operation being incompatible, despite advertising a posix compatible filesystem. Apparently the bug is fixed, but we moved off gluster, never to return.

Also, filesystems are hard, I get it, so no hard feelings :)