|
|
|
|
|
by embedding-shape
3 hours ago
|
|
> - And whose inference benefits are negated at scale. Transformers are still cheaper if you want to serve lots of users. But my entire point is about the reverse of this, the context of what I bring up is in single-user scenarios, which is where these diffusion models really make a large difference in performance. Sure, I agree it's not a good fit for every single use case out there, everywhere. But after starting to play around with it closer myself, I think people are dismissing it a bit too quickly, at least if you're interested in running local models on your own hardware. |
|
Since training models is currently a very expensive procedure, diffusion llms are destined to be relegated to the occasional research artifact at best. As things stand, making a serious commitment to them is basically the equivalent of throwing money into a fire pit and things are expensive enough as is.
Alternate Architectures that do a much better job matching transformers in quality have basically gone nowhere but you expect one that is basically worse in every way the labs care about won't ? I'm not trying to 'dismiss' dllms. I'm interested in them for the same reason you are. I'm just stating the factors at play plainly.