Hacker News new | ask | show | jobs
by Jackson__ 980 days ago
I too would like to know about the training dataset, as I just took a look at the one for LLava[0], and found out that they used a pretty big amount of BLIP auto generated captions.

This seemed a bit surreal to me, like trying to train an LLM with the outputs of a worse performing smaller LLM.

[0] https://github.com/haotian-liu/LLaVA/blob/main/docs/Data.md#...