Hacker News new | ask | show | jobs
by xxbondsxx 3874 days ago
Hey everyone! FB engineer here who wrote the original code for this and gave the talk during @Scale 2015. There's a recording of our talk here: https://youtu.be/ptsCWGZW_P8?t=333

That's a bit more visual and easier to understand. At the end of the day its really just DFS with seen state and selective exploration :)

2 comments

How does FB decide which disasters qualify for the safety check and which do not?
As described in a post by Zuckerberg, before the Paris attack it was only activated for natural disasters. Paris was the first time it was activated for human disasters and they will be doing it more in the future.
Respectfully, that doesn't answer the question. What is the current policy for determining use?
Probably "If the people in charge of running it decide it is worthy"
The real reason, which they probably can't say, is how much media attention it gets in the USA.
Why was this down-voted? Doesn't it have a high chance of stimulating meaningful discussion?
I'm entertaining the notion that this is a way to justify some new privacy-invading feature. I'd love to be wrong, and I'm open to a statement from a FB employee, if such a thing were possible.
Or perhaps it motivates the idea that if you do not use facebook you can never be completely "safe".
"Never let a good crisis go to waste."
I think the post had a bit of a frantic working backwards and managing PR tinge to it. A cyclone/tsunami also takes place over an extended period of time, and is very similar to a human disaster.

In any case, it is a really nice feature and the fact that they will start using it for more disasters is undoubtedly a good thing.

Sorry, I don't have time to see the video right now, but I am wondering if it isn't possible for you to use something like Hadoop/Hive/Presto to simply get a list of all users in Paris on demand.
Hive and Hadoop are offline -- it can take ~45 minutes to execute a query on our entire user table (even longer if it involves joins) and certain times of the day its slower (during work hours usually). Not only that, but once the query executes some engineer has to go copy and paste into a script that would likely run on one machine.

Doing this in a distributed async job fashion allowed for a lot more flexibility. Even better, we can even change the geographic area as the algorithm runs and those changes are reflected immediately.