Hacker News new | ask | show | jobs
by cbuskilla 2230 days ago
Sure! It is the 90M params models and they trained models up to almost 10B params so I guess it gets better with the size (Didn't try way too expensive).

And I agree about the alice derivates mitzuku is nice without doing anything fancy.