| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by scalalang 372 days ago
	https://arxiv.org/pdf/2407.21771 In this research, they revealed that the VLM can pay more attention to the image simply by chaining attention weights.