| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vjsrinivas 100 days ago
	Great work and love the detailed breakdown. This is kind of tangential, but it reminded me of this work: https://arxiv.org/pdf/2310.12973 (Frozen Transformers in Language Models are Effective Visual Encoder Layers). The paper puts out an interesting hypothesis that these LLM-derived transformer layers have the ability to "refine" any set of learned tokens, even in different modalities. I wonder if what you're seeing here is related?