| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by KaiserPro 1528 days ago

> As a result, primary databases (e.g. MySQL, Mongo etc.) almost never work

I mean it does. As far as I'm aware Facebook's ad platform is mostly backed by hundreds of thousands of Mysql instances.

But more importantly this post really doesn't describe issues of scale.

Sure it has the stages of recommendation, that might or might not be correct, but it doesn't describe how all of those processes are scheduled, coordinated and communicate.

Stuff at scale is normally a result of tradeoffs, sure you can use a ML model to increase a retention metric by 5% but it costs an extra 350ms to generate and will quadruple the load on the backend during certain events.

What about the message passing, like is that one monolith making the recommendation (cuts down on latency kids!) or micro services, what happens if the message doesn't arrive, do you have a retry? what have you done to stop retry storms?

did you bound your queue properly?

none of this is covered, and my friends, that is 90% of the "architecture at scale" that matters.

Normally stuff at scale is "no clever shit" followed by "fine you can have that clever shit, just document it clearly, oh you've left" which descends into "god this is scary and exotic" finally leading to "lets spend half a billion making a new one with all the same mistakes."

4 comments

xico 1528 days ago

Meta is relatively open (and open source) in how they handle stuff, including ranking, scoring and filtering described in the original article, but also fast inverted indexes and approximate nearest neighbors in high-dimensional spaces. See, for instance, Unicorn [1,2] or (at a lower level) FAISS [3].

[1] http://people.csail.mit.edu/matei/courses/2015/6.S897/readin...

[2] https://dl.acm.org/doi/pdf/10.1145/3394486.3403305

[3] https://faiss.ai/

link

judge2020 1528 days ago

> . As far as I'm aware Facebook's ad platform is mostly backed by hundreds of thousands of Mysql instances.

Same for YouTube itself https://www.mysql.com/customers/view/?id=750 and they use Vitess for horizontal scaling: https://vitess.io/

link

emptysea 1528 days ago

YouTube has since migrated to Spanner, there’s a podcast episode with one of the Vitess creators that covers the politics of the switch

link

ochoseis 1528 days ago

That sounds interesting — do you have a link?

link

emptysea 1527 days ago

Yeah it was a source graph episode from a bit ago: https://about.sourcegraph.com/podcast/sugu-sougoumarane/

link

efsavage 1528 days ago

> mostly backed by hundreds of thousands of Mysql instances

Kind of. It's part of the recipe but one you find at these large tech companies (I've worked at FB and GOOG) is they have the resources to bend even large/standard projects like MySQL to their will, while ideally preserving the good ideas that made them popular in the first place. There are wrappers/layers/modifications/etc that eventually evolve to subsume the original software, such that is acting more like a library than a standalone service/application. So, for example, while your data might eventually sit in a MySQL table, you'll never know, and likely didn't write anything specific to MySQL (or even SQL) to get there.

link

samhw 1528 days ago

I mean, this post from a year ago makes it sound not that non-standard: https://engineering.fb.com/2021/07/22/data-infrastructure/my...

What you're describing sounds like you mean something on the level of Cockroach, talking the Postgres wire protocol but implemented entirely independently underneath (which came indirectly out of Google). Facebook's MySQL deployment sounds more like a heavily-patched-but-basically-MySQL installation. I think Facebook is overanalogised to Google sometimes, as an engineering org.

(Admittedly I haven't worked at either whereas you have - though I have at another FAANG fwliw - but am basing this impression partly on what I hear from friends & partly on plain old stuff I read on the internet.)

link

Shish2k 1527 days ago

FB uses mysql in two very different ways - for the giant social-network database, mysql is basically a key-value store used as the storage layer for the graph database built on top. Then for the thousands of small utility databases (small enough to fit on a single machine) it’s used in a very vanilla way.

link

whimsicalism 1528 days ago

I disagree - this seems quite clearly to address issues of scale, going into multiple-pass ranking, etc. etc.

link