| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by santiagobasulto 13 days ago
	Not at all, I had the same feeling as yours the first time I read it. I think the key is that the "encoder" they're using is just a linear projection, which is probably pretty fast and memory efficient. A single matmul vs a ViT encoder is probably a huge win.