Hacker News new | ask | show | jobs
by pbowyer 308 days ago
Doesn't sound like it, but it's a nice writeup of the tools they stitched together. For someone to copy and open source... hopefully :)
2 comments

There are a few piece of this that rely on proprietary data, especially the FastText training step, so that's a dead-end unfortunately (would love to be proven wrong). I'd consider subbing in a small bert model with a classifier head for something FOSS without access to tons of user data, but then you lose the ability to serve high qps.
I guess not having that would only breaking forward geocoding from an address?
My guess is that they're using FastText for semantic search, so it's more likely to break queries like "coffee near me" than address search, the latter likely being handled by tantivy. For context, I've also written a geocoder [0] based on tantivy. :)

[0] https://github.com/ellenhp/airmail

Wow Airmail looks awesome. Have you ever benchmarked it on latency? I'm working on geocoding solutions for AI agents so quick tool calls is really important.
Tempted, specially for switching H3 instead of S2… I prototyped a similar solution a couple of weeks ago, so I could probably do a second pass
What's wrong with S2? H3 is so much more complex for very little gain from what I can tell.