Y
Hacker News
new
|
ask
|
show
|
jobs
by
sp332
877 days ago
That’s crazy, I’ve never seen one that dropped whole layers from a pre-trained model. I guess that avoids the sparse matrix math.