Hacker News new | ask | show | jobs
LLaVA-Plus: Large Language and Vision Assistants That Learn to Use Skills (llava-vl.github.io)
1 points by readyplayeremma 950 days ago
1 comments

LLaVA-Plus maintains a skill repository that contains a wide range of vision and vision-language pre-trained models (tools), and is able to activate relevant tools, given users’ multimodal inputs, to compose their execution results on the fly to fulfill many real-world tasks.