Hacker News new | ask | show | jobs
by stared 32 days ago
I tried to run it, but estimate is 24–33T parameters, vide https://gist.github.com/stared/a86d7380937e6d0ab7920014866ac....

It seems to be a huge overshot, vide Hy3 model, which this model claims to be 2.4T, while it is 295B.