Y
Hacker News
new
|
ask
|
show
|
jobs
by
orbital-decay
416 days ago
Does it beat them because it's a transformer, or because it's a much larger end-to-end model with higher quality multimodal training?
1 comments
scratchyone
416 days ago
I wonder if it benefits because it can attend to individual tokens of the prompt while generating, compared to typical diffusion models that just get a static vector embedding of the prompt.
link