Hacker News new | ask | show | jobs
by thecleaner 1621 days ago
Please don't bother with the whole charade of you have to do it to learn it. This kind of advice places more importance on people working at internet companies which have achieved scale. If you are at these companies then you have nothing to worry, experience will teach you. If you are not, follow @cppr's advice and read through DDIA. It will give you good enough introduction to data systems - not just databases. Then I would suggest to learn how message queues work since that part isn't covered in the book. Do a lot of mock designs and attack your designs with different load parameters - how much data do I store, how many machines (if more than one is necesary), can I do it simpler. Honestly if you can estimate storage and throughput well then you are fine. Pick up some distributed systems papers if you are very interested. Once you have these down you would be able to pick more topics yourself. Good luck. Keep learning.
1 comments

One topic I haven't seen widely discussed in books but happens in the real world is back pressure. I think you're getting to this with "attack your designs", though I find it hard to predict.

Have you any thoughts, books or resources you turn to, to make sure the whole pipeline is balanced - or doesn't fall over when one part is overloaded?

You need to see what happens in you data processing chain. The pipeline is only "balanced" based on what Rps you want. Throughput calculations are easier to do. Latencies are harder to predict but honestly latency calculations without sensible benchmarks are horseshit.

Back pressure is refusing to let the queue unbounded or take in more requests than slots available. That's pretty much it.