Hacker News new | ask | show | jobs
Ask HN: How can I gain some real distributed systems experience outside work?
16 points by multiplied 995 days ago
Current work doesn't have such scale and all the challenges that go with it.

I could cook up some projects I suppose, and flood it with dummy data. Just curious if there are any better ideas?

3 comments

Go and do some work on a heavily-used open source system that deals with distributed stuff, such as DNS/BIND or BGP or ActivityPub...
Thanks for those examples, that's exactly what I'm missing. If you know of anymore please do share.
As well as USENET that I mentioned elsewhere, a slightly less obvious set of distributed alogirthms are those that eg govern how small generators such as solar PV inverters sync to a grid, drop off if anything goes horribly wrong, ride through smaller bumps or fail to (which have caused sigificant upsets in the UK and Texas recently for example, so the algorithms/specs got tweaked). Indeed the workings of synchronous AC grids are huge realtime distributed systems, some literally steam powered, some software including AI and the financial markets, and the largest machines that we have. I spend way too much time tinkering at the edges of this sort of stuff, eg https://www.earth.org.uk/note-on-solar-DHW-for-16WW-UniQ-and...
Another is TCP congestion control including new variants such as BBR2 and relatives in QUIC:

https://www.theregister.com/2023/09/24/tcp_congestion_contro...

You can probably get to hack on some of that work!

Distributed systems arise everywhere. Even multi core CPUs are distributed systems.

Scale is not as much an issue as ensuring correctness. Projects with scale don’t necessarily have interesting distributed systems, and often merely involve using open source solutions.

As long as you build networking systems and recognize the potential distributed systems issues you’ll find them everywhere. And if you fix them properly you will get the practice you need

I already work on those. But scale does bring in a new set of challenges, and I am specifically looking for experience with that.
What kind of distributed systems work?

Usenet (servers) are as much of a distributed system as a Hadoop cluster.

The same can be said for e.g., DNS vs. Spark.

Handling large data basically. I already work with small data distributed systems.
The flooding algorithm of USENET is a wonder. Still worth understanding how it actually avoided endlessly sending the same articles in loops for example.

(I used to run one of the world's busiest USENET nodes from my house for a short while! I did not know that it was pssible to physically wear out hard discs until then...)