Hacker News new | ask | show | jobs
by yantrams 703 days ago
Tested these problems with llava-v1.6-mistral-7b and the results aren't bad. Maybe I just got lucky with these samples

Intersecting Lines https://replicate.com/p/s24aeawxasrgj0cgkzabtj53rc

Overlapping Circles https://replicate.com/p/0w026pgbgxrgg0cgkzcv11k384

Touching Circles https://replicate.com/p/105se4p2mnrgm0cgkzcvm83tdc

Circled Text https://replicate.com/p/3kdrb26nwdrgj0cgkzerez14wc

Nested Squares https://replicate.com/p/1ycah63hr1rgg0cgkzf99srpxm

1 comments

These are really interesting examples, thanks for sharing.
You're welcome. I recently noticed I get better performance with VLMs when the queries are phrased this way - Descriptive Keys instead of explaining the problem in sentences. Similar to COT reasoning that many people claim gives better results, I personally found querying in this sequence - existenceOfEntity, numberOfEntities followed by propertiesOfEntities etc tends to give better results. I haven't verified any of this rigorously so please do take it with a pinch of salt :)