| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tracyhenry 1018 days ago
	maybe someone more informed can help me understand why they didn't compared to Llava (https://llava-vl.github.io/)?

2 comments

kolja005 1018 days ago

The purpose of this research is to compare large vision-language models where the vision component is pre-trained using different techniques, namely on image classification versus unsupervised contrastive pre-training (see OpenAI's CLIP). PaLI-3 also isn't an instruction-tuned model, so comparing it to Llava would be a little apples-to-oranges.

link

dartos 1018 days ago

Maybe they just didn’t know about llava while conducting their research. It can take days to train a model sometimes.

link

buildbot 1018 days ago

Weeks to months at larger scales even.

link