| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by moffkalast 1080 days ago

> 60B parameters is still smaller than what GPT4 is using

I mean if the article is right, then it's about 3.3% the size of GPT 4 (although it's a sparse model so not all of it is used on every pass).

Meta also didn't train LLaMAs on nearly as much code it seems, so they're much worse for that in general.