| > As a result, primary databases (e.g. MySQL, Mongo etc.) almost never work I mean it does. As far as I'm aware Facebook's ad platform is mostly backed by hundreds of thousands of Mysql instances. But more importantly this post really doesn't describe issues of scale. Sure it has the stages of recommendation, that might or might not be correct, but it doesn't describe how all of those processes are scheduled, coordinated and communicate. Stuff at scale is normally a result of tradeoffs, sure you can use a ML model to increase a retention metric by 5% but it costs an extra 350ms to generate and will quadruple the load on the backend during certain events. What about the message passing, like is that one monolith making the recommendation (cuts down on latency kids!) or micro services, what happens if the message doesn't arrive, do you have a retry? what have you done to stop retry storms? did you bound your queue properly? none of this is covered, and my friends, that is 90% of the "architecture at scale" that matters. Normally stuff at scale is "no clever shit" followed by "fine you can have that clever shit, just document it clearly, oh you've left" which descends into "god this is scary and exotic" finally leading to "lets spend half a billion making a new one with all the same mistakes." |
[1] http://people.csail.mit.edu/matei/courses/2015/6.S897/readin...
[2] https://dl.acm.org/doi/pdf/10.1145/3394486.3403305
[3] https://faiss.ai/