|
|
|
|
|
by furiousteabag
639 days ago
|
|
Thanks for sharing Brooklyn text demo. Haven't seen it! Captioning images using VLM would definitely help as an additional conditional feature. Maybe it even would be enough to use only embeddings of captions to do search! We chose aerial satellite instead of street view because we plan to apply the same technologies where street view is not available, e.g. crop fields or forests. Another thing is that we plan to monitor areas that change frequently and street view data is not enough to keep up. But the idea is great! Although your query "palace of fine arts" is not extremely exciting because it is searchable via Google Maps :D "USF" by itself doesn't work, "USF word" pointed me where needed xD "beach" and "picnic tables" indeed doesn't work in object mode, but works great in "big" mode, probably because they needs some context around themselves "lots of people" didn't work, "a crowd of people" seems to work. Interesting, that almost the same (semantically) queries produce very different results! |
|