Hacker News new | ask | show | jobs
by obblekk 695 days ago
Worth noting this model has 50% more parameters than llama3. There are performance gains but some of the gains might be from using more compute rather than performance per unit compute.