| "We built a video stream processor by splitting every 1080p+, multi hour long, 30-60fps video into individual images and copying them across networks multiple times." Not surprising that didn't go will. This strikes me as a punching bag example. Anyone who has worked with images, video, 3d models, or even just really large blocks of text or numbers before (any kind of actually "big data") knows how much work goes into NOT copying the frames/files around unnecessarily, even in memory. Copying them across network is just a completely naive first pass at implementing something like this. Video processing is very definitely a job you want to bring the functions to the data for. That is why graphics card APIs are built the way they are. You don't see OpenGL offering a ton of functions to copy the framebuffers into ram so you can work on them there only to copy them back to the video card. And if you did do that, you will quickly find out that you can be 10x to 100x more efficient by just learning compute shaders or OpenCL. You could do this in a distributed fashion though, but it would have to look more like Hadoop jobs. I predict the final answer here, if they want to be reasonably fast as well, is going to be sending the videos to G4 instances and switching the detectors over to a shader language. In general, if the data is much bigger than the code in bytes, move the code, not the data. IO is almost always the most expensive part of any data processing job. If you're going to do highly scalable data processing, you need to be measuring how much time you spend on IO versus actually running your processing job, per record. That will make it dead obvious where you should spend your optimization efforts. |
Of course the only rational take on monoliths versus microservices is "use the right tool for the job".
But systems design interviews, FAANG, 'thought leaders', etc basically ignore this nuance in favour of something like the following.
Question: design pastebin (edit, I of course mean a URL shortener not pastebin)
Rational first pass but wrong Answer: Have a monolith that chucks the URL in the database.
Whereas the only winning answer is going to have a bunch of services, separate persistence and caching, a CDN, load balancing, replicas, probably a DNS and a service mesh chucked in for good measure.
I think this article shows that this is training and producing people who can't even think of the obvious first answer they have been so thoroughly indoctrinated.