Y
Hacker News
new
|
ask
|
show
|
jobs
by
Q6T46nT668w6i3m
980 days ago
Neat idea! Are the batches encoded as tokens into the input sequence? This is something I really like about the multi-modal PALM papers since it enables the multi-modal tokens to be referenced.
1 comments
ekelsen
980 days ago
Image patches are projected directly into an embedding that goes into the decoder Transformer. The same thing could be done for audio.
link