|
|
|
|
|
by acidmath
442 days ago
|
|
> object recovery is quite manual if you lose any When I read this I think "but you should never lose an object". Do you mean like the underlying data chunks Ceph stores? Can you elaborate on this part? I know some of the teams I work with do things in unorthodox ways and we tend to operate on different assumptions than others. > so pg’s can be spread across racks/datacenters. Some Ceph pools come to mind (this was a while ago, I'm sure they're still running though) where the erasure coding was done across cabinet rows and each cabinet row was on its own power distribution. I don't know how the power worked but I was told rather forwardly that some specific Ceph pools' failure domains aligned with the datacenter's failure domains. > We already have a loki setup Nice. We have logs go into S3 and then anyone who prefers a particular tool is welcome to load whatever sets of logs from S3 within the resource limits set for whatever K8s namespace they work with. Originally keeping logs append-only in S3 was for compliance but we wanted to limit team members by RAM quota rather than tools in line with the "people over tools over process" DevOps maxim. |
|
Say I 3x replicate data across racks and I have 3 concurrent rack failures where the stars align and I lose data. What do I do? I may want to make the tradeoff to have lower durability (say replicas are located within the same networking pod) for better performance due to lower latency between replicas. In that case maybe I am fine losing data once in a blue moon.