Hacker News new | ask | show | jobs
by jiggawatts 895 days ago
It has more parameters, but not all of them are used during inference. They compared models that use equal numbers of parameters.