Y
Hacker News
new
|
ask
|
show
|
jobs
by
ekelsen
980 days ago
Image patches are projected directly into an embedding that goes into the decoder Transformer. The same thing could be done for audio.