Hacker News new | ask | show | jobs
by jart 1170 days ago
I wanted it to be sparse. Doesn't matter if it wasn't. We're already talking about how to modify the training and evaluation to make it sparser. That's the next logical breakthrough in getting inference for larger models running on tinier machines. If you think I haven't done enough to encourage skepticism, then I'd remind you that we all share the same dream of being able to run these large language models on our own. I can't control how people feel. Especially not when the numbers reported by our tools are telling us what we want to be true.