Hacker News new | ask | show | jobs
by anon25783 1066 days ago
It may also be worth pointing out that, for any given person, with a pool of one billion eligible partners, there is likely to be on the order of one to ten million "perfect" partners, i.e. partners with whom one would be potentially able to form a stable, happy marriage. Per xkcd's blog post on the subject: https://what-if.xkcd.com/9/

This is not to detract from the utility of the algorithm per se; rather, its relevance in pairing up partners in a dating app. I'd have to guess that factors like physical proximity, speaking the same language, being in roughly the same age range, having the same political views, etc. can be used to straightforwardly narrow down the eligible pool for each user to about a thousand or so, at which point the algorithm the author describes as trivial is sufficient.

Also:

> With 1 Billion candidates for every person, you’d need 8 Million Terabytes of RAM to start with the classic algorithm. Multiply that by $10'000 per Terabyte of RAM, and you get a jaw-dropping $80 Billion. Practically infeasible and exorbitantly expensive.

Maybe I'm just missing something, and if so I'd love to be corrected, but I feel like if you had 2 billion users on your app, operating costs of $80 billion would not be out of the question.

4 comments

In practice, a lot of people are just not compatible with anyone
On the second paragraph: That’s exactly how it worked originally. You can create composite embeddings, and then define some hybrid similarity measure to compute both physical proximity, sentiment of writing, and some categorical tags.

Coding it and experimenting with metrics, however, was much harder then. There was no Numba and I had to implement every experiment in C++, without JIT-ing: https://github.com/unum-cloud/usearch/releases/tag/v0.19.0

The problem space is much smaller than 1b. I’d be surprised if it was much larger than 10k when controlled for practical factors like proximity age etc.
On the last point: Somehow I haven’t even considered that :) Still, that’s just the RAM cost. The overall cluster would be much more expensive.