|
|
|
|
|
by warangal
879 days ago
|
|
I think image-encoder from CLIP (even smallest variant ViT B/32) is good enough to capture a lot of semantic information to allow natural language query once images are indexed. A lot of work actually goes into integrating with existing meta-data like local-directory, date-time to augment NL query and re-ranking the results. I work on such a tool[0] to enable end to end indexing of user's personal photos and recently added functionality to index Google Photos too! [0] https://github.com/eagledot/hachi |
|