|
|
|
|
|
by Alex-Programs
483 days ago
|
|
This is a crazy paper. A first-generation diffusion model is beating LLama 3 in some areas, a model with a huge amount of tuning and improvement work. And it's from China again! A whole new "tree" of development has opened up. With so many possibilities - traditional scaling laws, out-loud chain of thought, in-model layer-repeating chain of thought, and now diffusion models - it seems unlikely to me that LLMs are going to hit a wall that the river of technological progress cannot flow around. I wonder how well they'll work at translation. The paper indicates that they're rather good at poetry. Interesting times. |
|