| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by taneq 7 days ago
	They’re multimodal LLMs trained for image generation. Turns out that if you want to generate images you gotta know what things look like.

1 comments

TZubiri 7 days ago

That's not helpful my brother. If you have details share them, if not, don't pretend you are more illuminated than me.

Is the image(text) function reversible? Or are they brute force searching a nearest neighbor like word2vec/hash brute forcing.

link

sorenjan 7 days ago

Google recently released their paper "Image Generators are Generalist Vision Learners" about exactly this. They fine tuned Nano Banana pro into what they call Vision Banana which can do segmentation etc.

https://arxiv.org/abs/2604.20329

link

TZubiri 6 days ago

very interesting, it seems that they use image(image,text) functions to process/filter images, effectively generating arbitrary bitmap(image), where bitmap is of the same dimension as image.

link