Hacker News new | ask | show | jobs
by andblac 611 days ago
Skimming through the source it seems to run 'car' and 'person' objects through llava with the following prompt:

- "person": "get gender and age of this person in 5 words or less",

- "car": "get body type and color of this car in 5 words or less".

So YOLO gives the bounding box and rough category, while llava describes the object in more details.