| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by riku_iki 387 days ago
	> One day, the system froze and after reboot the partition was unrecoverably gone (the whole story[1]). it looks like you didn't use raid, so any FS could fail in case of disk corruption.

2 comments

sandreas 387 days ago

Thank you for your opinion. Well... it did not just fail. Cryptsetup mounted everything fine, but the BTRFS tools did not find a valid filesystem on it.

While it could have been a bit flip that destroyed the whole encryption layer, BTRFS debugging revealed that there was some traces of BTRFS headers after mounting cryptsetup and some of the data on the decrypted partition was there...

This probably means the encryption layer was fine. The BTRFS part just could not be repaired or restored. The only explanation I have for this that something resulted in a dirty write, which destroyed the whole partition table, the backup partition table and since I used subvolumes and could not restore anything, most of the data.

Well, maybe it was my fault but since I'm using the exact same system with the same hardware right now (same NVMe SSD), I really doubt that.

link

riku_iki 386 days ago

> Well, maybe it was my fault but since I'm using the exact same system with the same hardware right now (same NVMe SSD), I really doubt that.

anecdotes could be exchanged in both directions: I run heavy data processing with max possible throughput on top of btrfs raid for 10 years already, and never had any data loss. I am absolutely certain if you expect data integrity while relying on single disk: it is your fault.

link

theamk 386 days ago

The reliability is about variety of workloads, not amount of data or throughput. It's easy to write a filesystem which works well in the ideal case, it's the bad or unusual traffic patters which cause problem. For all that I know maybe that btrfs complete failure was because of kernel crash caused by bad USB hardware. Or there was a cosmic ray hitting memory chip.

But you know who's fault is it? It's btrfs's one. Other filesystems don't lose entire volumes that easily.

Over time, I've abused ext4 (and ext3) in all sorts of ways: override random sector, mount twice (via loop so kernel's double-mount detector did not work), use bad SATA hardware which introduced bit errors.. There was some data loss, and sometimes I had to manually sort though tens of thousands of files in "lost+found" but I did not lose the entire filesystem.

I only saw the "entire partition loss" only happened to me when we tried btrfs. It was a part of ceph cluster so no actual data was lost.. but as you may guess we did not use btrfs ever again.

link

riku_iki 386 days ago

> but as you may guess we did not use btrfs ever again.

there are scenarious where btrfs is currently can't be replaced: high performance + data compression.

link

theamk 386 days ago

Sure, I can believe this. Does not change the fact that some people encounter compete data loss with it.

Sadly, there are people (and distributions) which recommend btrfs for general-purpose root filesystem, even for the cases where reliability matters much more than performance. I think that part is a mistake,

link

riku_iki 386 days ago

I would recommend btrfs as general purpose root filesystem. Any FS will have people encountering data loss. I can believe btrfs has N times higher chance of data loss because its packed with features and need to maintain various complicated indexes which are easier to corrupt, but I also believe that one should be ready that his disk will fail any minute regardless of FS, and do backup/replication accordingly.

link

mdedetrich 385 days ago

OpenZFS does a better job here, at least if you can deal with an out of tree filesystem.

link

riku_iki 385 days ago

actually, my personal benchmarks and multiple accounts in internet say it is much slower than btrfs under the load.

link

ahofmann 386 days ago

What the hell are you talking about? Any filesystem on any OS I've seen the last 3 decades had some kind of recovery path after any crash. Some of them lose more data, some of them less. But being unable to mount, is a bug that makes a filesystem untrustworthy and useless.

And how would RAID help in that situation?

link

riku_iki 386 days ago

> But being unable to mount, is a bug that makes a filesystem untrustworthy and useless.

we are in disagreement on this. If partition table entry corrupted, you can't mount without some low level surgery.

> And how would RAID help in that situation?

depending on raid, your data will be duplicated on another disk, and will survive in case of one/few disks corruption.

link

ahofmann 386 days ago

The partition table gets mostly written only once in the lifetime of a filesystem/disk. So it almost never corrupts during an os crash.

There are a lot of RAIDs and configurations. Some of them may do what you describe, but most don't.

link

riku_iki 386 days ago

Ok, I know this, not sure what is your point.

link