Hacker News new | ask | show | jobs
by sinaa 3036 days ago
Great work!

Do you simply do a BFS to find the shortest paths? If so, are you doing any tricks to avoid the path explosion problem?

2 comments

Thanks! I'm glad you asked. I actually do what I call a bi-directional breadth first search[1]. The gist of it is that instead of just doing a BFS from the source node until I reach the target node, I do a reverse BFS from the target node as well and wait until the two searches overlap. That helps with the exploding path problem, although that still becomes an issue for longer paths (>= 5 degrees generally). I also pre-compute all the incoming and outgoing links for each page when I create the database[2] so I don't need to do that upon every search, which resulted in a huge performance boost.

[1] https://github.com/jwngr/sdow/blob/a2699dc95d884ec64a4641630... [2] https://github.com/jwngr/sdow/blob/a2699dc95d884ec64a4641630...

When I was first playing with this, I actually really expected you to be using an expensive performant solution like neo4j. When I read that you were using sqlite, I didn't believe it at first and thought it was a mistype from dev until I looked over the source.

That's an impressive and well thought out performance enhancement, and that the app runs so blazingly fast on sqlite is very impressive.

My thoughts exactly. It's impressive that despite relying on BFS, the tool manages to be so fast on a tiny node on google cloud. Even when there's a depth of >= 5 !
Especially SQLite is often overlooked. A marvelous library that should be part of every personal toolkit.
Neo4j is not necessarily more performant, especially for a particular use case like this.
The downside of bidirectional seems to be that things like place names are linked via short paths through boring articles like "Census designated place" or "City"
The bi-directional nature of the search does not change the end result. It is simply a performance improvement.
Still - It can be annoying. Maybe there could be an option to exclude articles with specific text/link ratios?
It is bidirectional BFS: https://github.com/jwngr/sdow/blob/master/sdow/breadth_first...

A* can't be used given that path cost or expected remaining distance is unknown.

Any ideas on how such an algorithm could be used without precomputing the entire graph?