Hacker News new | ask | show | jobs
by senseiV 868 days ago
yes the size is different, but training a diffusion model and a language model are really different, like how RL models can be small but take a long time to train aswell