Y
Hacker News
new
|
ask
|
show
|
jobs
by
viraptor
879 days ago
For models with dropped weights, the keyword is "distilled". For example ssd-1b is a 50% size version of Stable Diffusion XL (
https://huggingface.co/segmind/SSD-1B
)
1 comments
sp332
879 days ago
That’s crazy, I’ve never seen one that dropped whole layers from a pre-trained model. I guess that avoids the sparse matrix math.
link