| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nathanmarz 5506 days ago
	To do that query on Neo4j, you would need to store in memory on one machine the entire Twitter social graph, all the people who tweeted every URL ever tweeted on Twitter, and then do the computation on a single thread. Neo4j can't handle that scale. The reach computation on Storm does everything in parallel (across however many machines you need to scale the computation) and gets data using distributed key/value databases (Riak, in our case).

1 comments

herdrick 5506 days ago

Nathan, we'd love to hear your postmortem on BackType's experience with Neo4J, and how Sphinx is turning out.

link

nathanmarz 5506 days ago

We used Neo4j over a year ago, and it was pretty unstable when we used it. The database files were getting corrupted pretty frequently (a few times a week), so it just didn't work out for us. Ultimately it was for a small feature, so rather than continue to struggle with Neo4j we just reimplemented the feature using Sphinx. Like I said, that was a long time ago and Neo4j may have gotten a lot better since then.

link

herdrick 5506 days ago

OK, thanks! That's valuable info.

link