| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Q6T46nT668w6i3m 980 days ago
	Neat idea! Are the batches encoded as tokens into the input sequence? This is something I really like about the multi-modal PALM papers since it enables the multi-modal tokens to be referenced.

1 comments

Image patches are projected directly into an embedding that goes into the decoder Transformer. The same thing could be done for audio.