Hacker News new | ask | show | jobs
by manimino 1371 days ago
Ducks, the story:

I was using Python in-memory vector search engine called Annoy [1] to do semantic search on various kinds of data. It worked great for finding "similar" objects. Story A has similar text to story B, image A looks like image B, etc.

So Annoy solved the hard part. But doing basic metadata lookups was surprisingly hard in Python. How do I get all images matching some criteria (say, size range, or tags)? I'd have to serialize them all into a DB, and use a DB index. Databases are great, but they add code bloat and overhead; I'm usually working Jupyter notebooks and I like keeping as few external dependencies as possible.

So I wrote ducks as a quick, convenient way to index anything.

There's lots of other usage patterns of course, it's very generic. It makes a great Wordle / crossword solver too. "Find me words where the first letter is A and the fifth letter is L" is very fast in ducks.

Indexing is just one of those things you always need. Python didn't have a good way to do it, and now it does!

Source code's here if you're curious: https://github.com/manimino/ducks

[1] Annoy: https://github.com/spotify/annoy

1 comments

Huh, that's really cool and makes a lot of sense. Thanks for sharing more about it