Hacker News new | ask | show | jobs
by anvuong 794 days ago
I actually can't wrap my head around this number, even though I have been working on and off with deep learning for a few years. The biggest models we've ever deployed on production still have less than 1B parameters, and the latency is already pretty hard to manage during rush hours. I have no idea how they deploy (multiple?) 1.8T models that serve tens of millions of users a day.
1 comments

It's a mixture of experts model. Only a small part of those parameters are active at any given time. I believe it's 16x110B