It's a bit difficult at the moment, given we have a lot of proprietary data at the moment and a lot of the logic follows it. I'm hoping we can get it to a state where it can be indexed and serving OSM data but that is going to take some time.
That being said, we are currently working on getting our Google S2 Rust bindings open-sourced. This is a geo-hashing library that makes it very easy to write a reverse geocoder, even from a point-in-polygon or polygon-intersection perspective.
There are a few piece of this that rely on proprietary data, especially the FastText training step, so that's a dead-end unfortunately (would love to be proven wrong). I'd consider subbing in a small bert model with a classifier head for something FOSS without access to tons of user data, but then you lose the ability to serve high qps.
My guess is that they're using FastText for semantic search, so it's more likely to break queries like "coffee near me" than address search, the latter likely being handled by tantivy. For context, I've also written a geocoder [0] based on tantivy. :)
Wow Airmail looks awesome. Have you ever benchmarked it on latency? I'm working on geocoding solutions for AI agents so quick tool calls is really important.
That being said, we are currently working on getting our Google S2 Rust bindings open-sourced. This is a geo-hashing library that makes it very easy to write a reverse geocoder, even from a point-in-polygon or polygon-intersection perspective.