Hacker News new | ask | show | jobs
by sp332 877 days ago
That’s crazy, I’ve never seen one that dropped whole layers from a pre-trained model. I guess that avoids the sparse matrix math.