Hacker News new | ask | show | jobs
by manuel_w 638 days ago
Discussions on checksumming filesystems usually revolve around ZFS and BTRFS, but has someone any experience with bcachefs? It's upstreamed in the linux kernel, I learned, and is supposed to have full checksumming. The author also seems to take filesystem responsibility seriously.

Is anyone using it around here?

https://bcachefs.org/

6 comments

I tried it out on my homelab server right after the merge into the Linux kernel.

Took roughly one week for the whole raid to stop mounting because of the journal (8hdd, 2 ssd write cache, 2 nvme read cache).

The author responded on Reddit within a day, I tried his fix, (which meant compiling the Linux kernel and booting from that), but his fix didn't resolve the issue. He sadly didn't respond after that, so I wiped and switched back to a plain mdadmin raid after a few days of waiting.

I had everything important backed up, obviously (though I did lose some unimportant data), but it did remind me that bleeding edge is indeed ... Unstable

The setup process and features are fantastic however, simply being able to add a disk and flag it as read/write cache feels great. I'm certain I'll give it another try in a few years, after it had some time in the oven.

New filesystems seems to have a chicken and egg problem really. It's not like switching from Nvidia's proprietary drivers to nouveau and then back if it turns out they don't work that well. Switching filesystems, especially in larger raid setups where you desperately need more testing and real world usage feedback, is pretty involved, and even if you have everything backed up it's pretty time consuming restoring everything should things go haywire.

And even if you have the time and patience to be one of these early adopters, debugging any issues encountered might also be difficult, as ideally you want to give the devs full access to your filesystem for debugging and attempted fixes, which is obviously not always feasible.

So anything beyond the most trivial setups and usage patterns gets a miniscule amount of testing.

In an ideal world, you'd nail your FS design first try, make no mistakes during implementation and call it a day. I'd like to live in an ideal world.

> In an ideal world, you'd nail your FS design first try, make no mistakes during implementation and call it a day

Crypto implementations and FS implementations strike me as the ideal audience for actually investing the mental energy in the healthy ecosystem we have of modeling and correctness verification systems

Now, I readily admit that I could be talking out of my ass, given that I've not tried to use those verification systems in anger, as I am not in the crypto (or FS) authoring space but AWS uses formal verification for their ... fork? ... of BoringSSL et al https://github.com/awslabs/aws-lc-verification#aws-libcrypto...

A major chunk of storage reliability is all these weird and unexpected failure modes and edge cases which are not possible to prepare for, let alone write fixed specs for. Software correctness assumes the underlying system behaves correctly and stays fixed, which is not the case. You can't trust the hardware and the systems are too diverse - this is the worst case for formal verification.
That was a decision Linus regretted[1]. There has been some recent discussion about this here on Hacker News[2].

[1] https://linuxiac.com/torvalds-expresses-regret-over-merging-...

[2] https://news.ycombinator.com/item?id=41407768

Context. Linux regrets it because bcachefs doesn't have same commitment to stability as Linux.

Kent wants to fix a bug with large PR

Linux doesn't want to merge and review PR that touches so many non-bcachefs things.

They're both right in a way. Kent wants bcachefs to be stable/work good, Linus wants Linux to be stable.

Edit: replied to wrong person. I agree with you.

Kent from bcachefs was just late in the cycle, somewhere in rc5. That was indeed too late for such a huge push of new code touching so many things.

There is some tension but there is no drama and implying so is annoying.

Bcachefs is going places, I think I’d already choose it over btrfs atm.

After reading the email chain I have to say my enthusiasm for bcachefs has diminished significantly. I had no idea Kent was that stubborn and seems to have little respect for Linus or his rules.
As usual, the top comments in that submission are very biased. I think HN should sort comments in a random order in every polarizing discussion. Anyone reading this, do yourself a favor and dig through both links, or ignore the parent's comment altogether.

Linus "regretted" it in the sense "it was a bit too early because bcachefs is moving at such a fast speed", and not in the sense "we got a second btrfs that eats your data for lunch".

Please provide context and/or short human-friendly explanation, because I'm pretty sure most readers won't go further than your comment and will remember it as "Linus regrets merging bcachefs", helping spread FUD for years down the line.

You're saying this like the takeaway of "Linus regrets merging bcachefs" is unfair when the literal quote from Linus is "[...] I'm starting to regret merging bcachefs." And earlier he says "Nobody sane uses bcachefs and expects it to be stable[...]".

I don't understand how you can read Linus' response and think "Linus regrets merging bcachefs" is an unfair assessment.

what attachment to bcachefs do you have? the concerns are valid and at first i didn’t read it ask Linus not wanting another btrfs. But now thinking about it, why do we have another competing filesystem being developed at this point at all?
Well. Point taken. You have an important core of truth to your argument about polarization.

But...

Strongly disagree.

I think that is a very unfair reading of what I wrote. I feel that you might have a bias which shows but that would be the same class of ad hominem as you have just displayed. That is why I choose to react even though it might be wise to let slepping dogs lie. We should minimize polarization but not to a degree where we cannot have civilized disagreement. You are then doing exactly what you preach not to do. Is that then FUD with FUD on top? Two wrongs make a right?

I was reacting on the implicit approval in mentioning that it had been upstreamed in the kernel. The reason for the first link. Regrets where clearly expressed.

Another HN trope is rehashing the same discussions over and over again. That was the reason for the second link. I would like to avoid yet another discussion on a topic which was put into light less than 14 days ago. Putting that more bluntly would have been impolite and polarizing. Yet here I am.

The sad part is that my point got through to you loud and clear. Sad because rather than simply dismissing as polarizing that would have been a great opener for a discussion. Especially in the context of ZFS and durability.

You wrote:

> Linus "regretted" it in the sense "it was a bit too early because bcachefs is moving at such a fast speed", and not in the sense "we got a second btrfs that eats your data for lunch".

If you allow me a little lighthearted response. The first thing which comes to mind was the "They're the same picture" meme[1] from The Office. Some like to move quickly and break things. That is a reasonable point of view. But context matters. For long term data storage I am much more conservative. So while you might disagree; to me it is the exact same picture.

Hence I very much object to what I feel is an ad hominem attack because your own worldview was not reflected suitably in my response. It is fair critique that you feel it is FUD. I do however find it warranted for a filesystem which is marked experimental. It might be the bees knees but in my mind it is not ready for mainstream use. Yet.

That is an important perspective for the OP to have. If the OP just want to play around all is good. If the OP does not mind moving quickly and break things, fine. But for production use? Not there yet. Not in my world.

Telling people to ignore my comment because you know people cannot be bothered to actually read the links? And then lecturing me that people might take the wrong spin on it? Please!

[1] https://knowyourmeme.com/memes/theyre-the-same-picture

It is marked experimental, and since it was merged into the kernel there have been a few major issues that has been resolved. I wouldn't risk production data on it, but for a home lab it could be fine. But you need to ask yourself, how much time are you willing to spend if something should go wrong? I have also been running ZFS for 15+ years, and I've seen a lot of crap because of bad hardware. But with good enterprise hardware it has been working flawless.
I'm using it. It's been ok so far, but you should have all your data backed up anyway, just in case.

I'm trying a combination where I have an SSD (of about 2TiB) in front of a big hard drive (about 8 TiB) and using the SSD as a cache.

i do this on my synoligy using btrfs. i’m still not convinced SSD caching gives any benefit for a home user. 5 spindle drives can already read and write faster than line rate on the NIC (1gbe) so what is the point of adding another failure point?
> 5 spindle drives can already read and write faster than line rate on the NIC (1gbe) so what is the point of adding another failure point?

SSDs are more about latency than throughput. (And who wants to deal with five spindle drives in a desktop computer?)

In any case, in my case I had the SSD first and bought the HDD to expand my storage capacity.

I don't know whether your use case cares about latency, or about the number of drives. Your trade-offs might be different from mine.

You manage 5 discs in a device because you care about data protection, being safer than a single disc.

Yes SSDs in theory are faster but you are only as fast as your slowest link, which is the spindle drive. so that cache is a buffer only for frequently read data. in home environments they’re next to useless. in enterprises they’re certainly useful.

> Yes SSDs in theory are faster but you are only as fast as your slowest link, which is the spindle drive. so that cache is a buffer only for frequently read data. in home environments they’re next to useless.

If you check the numbers I gave above, I have 2 TiB SSD and 8 TiB hard disk. My 'frequently read data' is basically all the data I care about accessing. The other 8 TiB is mostly for eg steam games I installed and forgot about or for additional backups of some data from cloud services, like Google Photos. These are mostly write-once-read-never.

And eg if I happen to access a steam game that's currently on the HDD, it will quickly migrate to the SSD.

My 'working set' of data is certainly smaller than 2 TiB.

Five disks are safer than a single disk (if you store things multiple times or with erasure coding), but if you stick all five disks in a single device, the safety gains are rather more limited.

yea so again the single spindle drive is slowing it down. the spindle drive doesn’t get faster because it has an SSD, if you read something from the spindle it will read at the same speed the spindle is rated at. after it’s loaded into the SSD then it’s faster. but only then

> Five disks are safer than a single disk (if you store things multiple times or with erasure coding), but if you stick all five disks in a single device, the safety gains are rather more limited.

There are devices called storage servers or NAS. this idea of not putting more than one spindle in a machine is foreign to me.

I'm optimistic about it, but probably won't switch over my home lab for a while. I've had quirks with my (now legacy) zsys + zfs on root for Ubuntu, but since it's a common config//widely used for years it's pretty easy to find support.

I probably won't use bcachefs until a similar level of adoption/community support exists.

Can't comment on bcachefs (I think it's still early), but I've been running with bcache in production on one "canary" machine for years, and it's been rock-solid.