|
|
|
|
|
by danpalmer
1815 days ago
|
|
I think it's important to know about different data storage options and their trade-offs. Managing state is one of the hardest parts of backend development, particularly at scale, so an understanding of the trade-offs in databases/caches/blob-storage/queues, and when each is useful is important. I'd pay close attention to speed and "correctness". What's the consistency model of a system? Can we lose data and if so how? What's the throughput? Latency? These help choose good technology for backend systems, and helps answer questions like: - Can we do this in-band while serving a user request? - Can we do it 100 times to serve a request? - If it completes successfully can we trust it or do we still need to handle failure? - Can we trust it immediately or eventually? There are lots of technologies and terms for all of this but I've specifically avoided them because the important bit is the mental model of how these things fit together and the things they allow/prevent. |
|
Also, I’ve had to explain this to so many other engineers, both junior and senior to me: most data is inherently relational. This next statement is a bit opinionated: 9 times out of 10 you probably want an RDBMS. I’ve seen so many attempts to shoehorn some ElasticSearch/Mongo/Neo4j/whatever database into a design because the developer wanted to work on CoolDatabaseTech. Then you’re stuck dealing with joins in CoolDatabase that it wasn’t really designed to do and frustrated at CoolDatabase’s lack of drivers in X language. Later on you’re dealing with stability and scalability issues you would never see with BattleTestedRDBMS.
The amount of capability a well designed Postgres instance can output is insane. I’ve seen a single vertically scaled Postgres instance compete with 100+ node Spark clusters on computations.