Hacker News new | ask | show | jobs
by mikeocool 618 days ago
I love all of the software coming out recently backed by simple object storage.

As someone who spent the last decade and half getting alerts from RDBMSes I’m basically to the point that if you think your system requires more than object storage for state management, I don’t want to be involved.

My last company looked at rolling out elastic/open search to alleviate certain loads from our db, but it became clear it was just going to be a second monstrously complicated system that was going to require a lot of care and feeding, and we were probably better off spending the time trying to squeeze some additional performance out of our DB.

3 comments

This is a very unix philosophy right? Everything is a file?[1]

[1]https://en.wikipedia.org/wiki/Everything_is_a_file

Not quite - "everything is a blob" has very different concurrency semantics to "everything is a POSIX file". You can't write into the middle of a blob, for example. This makes certain use cases harder but the concurrency of blobs is much easier to reason about and get right.

Personally I think you might actually need a DB to do the work of a DB, and you can't as easily build one on top of a blob store as on a block device. But I do think most distributed systems should use blob and/or DB and not the filesystem.

On the other hand, the S3-compatible server options are quite limited. While you're not locking yourself to one cloud, you are locking yourself to the cloud.
At this point my career, I’ve found that paying to make something hard someone else’s is often well worth it.
Why would you prefer state management in object storage vs a relational (or document) database?
Two main reasons I can see:

Ops is easier, for the most part. Doing ops on an RDBMS correctly can be a pain. Things like replication, failover, performance tuning, etc etc can be hard. This is much less of an issue because services like RDS solve this and have solved it for a long time. Not a huge issue there.

Splitting compute from storage makes scaling a lot easier, especially when storage is an object store system where you don't have to worry about RAID, disk backups, etc etc. Especially for clustered systems like elasticsearch, having object store backing would be incredible: if you need to spin up/down a new server, instead of starting it, convincing it to download the portions of the indexes it's supposed to and waiting for everything to transfer, you just start it and let it run immediately. You can also now run 80% spot instances for your compute nodes because if one gets recalled, the replacement doesn't have to sync all its state from the other servers, it can just go to business as usual, and a sudden loss of 60% of your nodes doesn't mean data loss like it does if your nodes are holding all the state.

I think for something like an RDBMS, object-store backing is very likely completely overkill, unless you're hitting some scaling threshold that most of us don't deal with ever. For clustered DB systems (cassandra/scylla, ES, etc etc), splitting out storage makes cluster management, scalability, and resiliency worlds easier.

So many less moving parts to manage/break.