| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by marvinkennis 1191 days ago
	Seeing a lot of text-to-image out there recently. Does anyone know what the current state of the art is on image-to-text? Thinking something similar to Midjourney's /describe command that they added in v5

2 comments

mkaic 1191 days ago

While it's not publicly available yet, I have strong suspicions that multimodal GPT-4 may actually be SOTA in image-to-text. The examples shown in the Sparks of AGI paper were extremely impressive imo, though of course those are cherry-picked so it's unclear how well the model will perform on non-cherry-picked images.

link

jah242 1191 days ago

This is text + image -> text but pretty cool and still might be of interest to you:

https://llava-vl.github.io

link

marvinkennis 1191 days ago

Just entering "Describe this image" in the chat prompt got me exactly what I was looking for. Thanks!

link