|
|
|
|
|
by georgeburdell
223 days ago
|
|
Mine was an automated file transfer system that had to be 100% reliable on an insanely unreliable network (~95% uptime). Took about 9 months of bug squashing after development was done. So many edge cases. I would probably never mention this in a job interview because I doubt most people would understand why it was so hard. |
|
We needed to do a nightly transfer of data. We had a variable amount of data to transfer, but typically in the range of one to two TB. We had a 1GBit link between the data centres housing the two systems, but it wasn't an exclusive link - backups and other stuff would be running during the night as well, so if we hog all bandwidth we'd have to deal with unhappy people. Hard deadline for the transfer was start of the work day.
Now the data does compress easily - but the data is only available for compression at the beginning of our sync window. We definitely need to compress some of the data to sync everything in time, and keep other users of the line happy. But: If we spend too much time on the compressing we might not have enough time left to send the data, plus we're not alone on the systems - other people will be unhappy about their nightly jobs failing if we hog all the available CPU time.
So we needed to find the right balance of data compression and bandwidth utilisation, taking into account all those factors, to make things work in the amount of time we had available.
Thanks to AMD nowadays we'd just throw more CPUs at the problem, but back then the 8 CPU server we were using was already quite expensive.