Hacker News new | ask | show | jobs
by sophia01 308 days ago
They're not open sourcing it though?
2 comments

It's a bit difficult at the moment, given we have a lot of proprietary data at the moment and a lot of the logic follows it. I'm hoping we can get it to a state where it can be indexed and serving OSM data but that is going to take some time.

That being said, we are currently working on getting our Google S2 Rust bindings open-sourced. This is a geo-hashing library that makes it very easy to write a reverse geocoder, even from a point-in-polygon or polygon-intersection perspective.

Could you write a photon replacement if you had that? I would love to spend less per month running photon for my project.
Doesn't sound like it, but it's a nice writeup of the tools they stitched together. For someone to copy and open source... hopefully :)
There are a few piece of this that rely on proprietary data, especially the FastText training step, so that's a dead-end unfortunately (would love to be proven wrong). I'd consider subbing in a small bert model with a classifier head for something FOSS without access to tons of user data, but then you lose the ability to serve high qps.
I guess not having that would only breaking forward geocoding from an address?
My guess is that they're using FastText for semantic search, so it's more likely to break queries like "coffee near me" than address search, the latter likely being handled by tantivy. For context, I've also written a geocoder [0] based on tantivy. :)

[0] https://github.com/ellenhp/airmail

Wow Airmail looks awesome. Have you ever benchmarked it on latency? I'm working on geocoding solutions for AI agents so quick tool calls is really important.
Tempted, specially for switching H3 instead of S2… I prototyped a similar solution a couple of weeks ago, so I could probably do a second pass
What's wrong with S2? H3 is so much more complex for very little gain from what I can tell.