| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by unnouinceput 2034 days ago

As a side note, if you have the possibility to sort data before searching values in it then implement trie:

https://en.wikipedia.org/wiki/Trie

This is the best for data with very few updates/modifications and a lot of queries. One example would be points of interest on a map, like restaurants on google maps. Not like those restaurants gets updated/modified every single day but you do have clients that while using your restaurant finder app they query a lot of them on a given radius around their GPS coordinates.

I had such a project in the past. My client acquired this data, around 3 billion points of interests residing in a PostgreSQL DB that was around 900 GB. While the server was beefed with 128GB RAM, was not nearly enough to have it all in memory and it had a responsiveness of several seconds, way too much for young impatient users and my client was seeing a decline in users. I reorganized data in PGSQL to fit a trie tree and now a query was solved in milliseconds. Accounting all other stuff on top, the app was now able to generate a response under a second. Trie ftw.

2 comments

karterk 2034 days ago

Tries are insanely good for implementing fuzzy search as well (say based on a Damerau–Levenshtein distance).

Using this approach for a typo-tolerant instant search engine that I am working on: https://github.com/typesense/typesense

link

tpolzer 2034 days ago

If you're already using Postgres, isn't that precisely what GiST indexes are meant for?

link

unnouinceput 2034 days ago

Yes, they are. But those are general approach. While a good approach my implementation was custom made for the client's need, allowing to achieve a better performance.

link