Hacker News new | ask | show | jobs
by unnouinceput 2034 days ago
As a side note, if you have the possibility to sort data before searching values in it then implement trie:

https://en.wikipedia.org/wiki/Trie

This is the best for data with very few updates/modifications and a lot of queries. One example would be points of interest on a map, like restaurants on google maps. Not like those restaurants gets updated/modified every single day but you do have clients that while using your restaurant finder app they query a lot of them on a given radius around their GPS coordinates.

I had such a project in the past. My client acquired this data, around 3 billion points of interests residing in a PostgreSQL DB that was around 900 GB. While the server was beefed with 128GB RAM, was not nearly enough to have it all in memory and it had a responsiveness of several seconds, way too much for young impatient users and my client was seeing a decline in users. I reorganized data in PGSQL to fit a trie tree and now a query was solved in milliseconds. Accounting all other stuff on top, the app was now able to generate a response under a second. Trie ftw.

2 comments

Tries are insanely good for implementing fuzzy search as well (say based on a Damerau–Levenshtein distance).

Using this approach for a typo-tolerant instant search engine that I am working on: https://github.com/typesense/typesense

If you're already using Postgres, isn't that precisely what GiST indexes are meant for?
Yes, they are. But those are general approach. While a good approach my implementation was custom made for the client's need, allowing to achieve a better performance.