Thanks for the feedback! This is also something I noticed, I might try the newer models that are currently comming up. I also think the approximate nearest neighbor search might introduce some errors which is probably a good idea to check. It also really helps knowing the whole VQGAN+CLIP space, since you can optimize query strings in a similar way. For example prepending "a picture of" or adding some qualifier at the end like "unsplash" for a particular style.
Wikimedia search seems to work better for most searches I tried, possibly because of the manual tags etc.
https://imagioo.com/?q=astronaut+with+american+flag
https://commons.wikimedia.org/wiki/Special:MediaSearch?type=...
You might want to include examples where your search is better or just a faq on how to use it.
Nice idea though. It does seem to come in handy when you don't have descriptions of images. Eg: https://github.com/haltakov/natural-language-image-search