Hacker News new | ask | show | jobs
by koverstreet 574 days ago
Oh, that is tricky. If you want to play around with the algorithm that picks which device to read from, it's in fs/bcachefs/extents.c

  static inline bool ptr_better(struct bch_fs *c,
                              const struct extent_ptr_decoded p1,                                                                            
                              const struct extent_ptr_decoded p2)                                                                            
  {             
        if (likely(!p1.idx && !p2.idx)) {                                                     
                u64 l1 = dev_latency(c, p1.ptr.dev);
                u64 l2 = dev_latency(c, p2.ptr.dev);                                          
                
                /* Pick at random, biased in favor of the faster device: */
                                                                                              
                return bch2_rand_range(l1 + l2) > l1;
        }       
                            
        if (bch2_force_reconstruct_read)
                return p1.idx > p2.idx;        
                                               
        return p1.idx < p2.idx;
  }
Perhaps just squaring the device latencies would balance things out more the way we want.
1 comments

I remember this code!

If we're talking about my desktop, its current configuration is 3x 2TB NVMe (configured as zfs cache) plus 2x 12TB HDDs (mirrored). I've set sync=disabled, with transaction groups committing every 10 minutes — this is fine for my use case — so the HDDs spend most of their time spun down.

I only actually have 4TB of data on the system. It keeps growing, but the working set is probably much less than that.

Which means, it's 100% cached. A single read sent to the HDDs would have a latency of multiple seconds; absolutely catastrophic for a desktop workload. In this case _always_ using the cache _is_ the right answer, but I've been trying to think of an algorithm that would be able to do so without hardcoding it.