Hacker News new | ask | show | jobs
by 0xbadcafebee 1989 days ago
I am missing a lot of context from this post because this just sounds nonsensical.

First they're conflating storage with transport. SQL databases are a storage and query system. They're intended to be slow, but efficient, like a bodybuilder. You don't ask a bodybuilder to run the 500m dash.

Second, they had a 150MB dataset, and they moved to... a distributed decentralized key-value store? They went from the simplest thing imaginable to the most complicated thing imaginable. I guess SQL is just complex in a direct way, and etcd is complex in an indirect way. But the end results of both are drastically different. And doesn't etcd have a whole lot of functional limitations SQL databases don't? Not to mention its dependence on gRPC makes it a PITA to work with REST APIs. Consul has a much better general-purpose design, imo.

And more of it doesn't make sense. Is this a backend component? Client side, server side? Why was it using JSON if resources mattered (you coulda saved like 20% of that 150MB with something less bloated). Why a single process? Why global locks? Like, I really don't understand the implementation at all. It seems like they threw away a common-sense solution to make a weird toy.

1 comments

I'd answer questions but I'm not sure where to start.

I think we're pretty well aware of the pros and cons of all the options and between the team members designing this we have pretty good experience with all of them. But it's entirely possible we didn't communicate the design constraints well enough. (More: https://news.ycombinator.com/item?id=25769320)

Our data's tiny. We don't want to do anything to access it. It's nice just having it in memory always.

Architecturally, see https://news.ycombinator.com/item?id=25768146

JSON vs compressed JSON isn't the point: see https://news.ycombinator.com/item?id=25768771 and my reply to it.

You say you want to have a small database just for each control process deployment to be independent. But you need multiple nodes for etcd... So you currently have either a shared database for all control processes, or 3 nodes per control process, or 3 processes per control node, etc. Either way it seems weird.

I get that SQLite wouldn't work, but it also doesn't make sense to have one completely independent database per process. So I imagine you're using a shared database, at which poitlnt etcd starts to make more sense. It's just not that widely understood in production as sql databases, and has limitations which you might reach in a few years.

> It's just not that widely understood in production as sql databases, and has limitations which you might reach in a few years.

Reaching limitations in a few years and biting that bullet makes the difference between a successful startup that knows when and where to spend time innovating or a startup that spends all their time optimizing for that 1 million simultaneous requests / sec.

It's not about optimizing for scale, it's about optimizing for velocity. I don't care if I can only get to 1K RPS. I care if my team and product can work quickly. You cannot work quickly later if you slap something together now and later realize, oh shit, we have to stop pushing features for a month so we can completely rebuild the backend and everything that depends on it.

It's the devil you know versus the devil you don't. SQL is a very well understood devil, so your plans around it will be reliable. I would argue that being able to accurately estimate future work is the most valuable business asset.