Hacker News new | ask | show | jobs
by dcposch 1742 days ago
This is incredible. How big of a VPS would you need to preprocess the whole dataset now?

If I use peermaps and zoom to a particular city, how does it find peers that have that particular part of the db?

2 comments

It's similar how with a torrent you can start seeking into a particular spot in a file and start playing by requesting particular chunks at that spot. Some clients like webtorrent support this behavior but it changes the dynamics of the network somewhat if many clients do this kind of thing. You can build some supplementary peer info to help the process along for different p2p networks depending on if they let you create side-channels or let you make more explicit connections to peers. For peermaps, the database is file and directory based so most of that peer tree traversal should be handled already by the network. And there are more ways to optimize the connections with additional tricks once you get the basics working with a somewhat slower and less sophisticated transfer method.
The VPS we're running on has 60GB of RAM which should be plenty but the ingest program needs more work to use less memory so it stops crashing when denormalizing multipolygon relations, which involves denormalizing ways which fetch nodes... all referenced by ID which has not much locality spread across the pbf file. And if you write to temporary storage it can use a lot of disk and denormalization based on the on-disk format can get really slow. It's just all very tricky to get working well within reasonable time constraints (less than a week of processing ideally) and a reasonable memory footprint.