Hacker News new | ask | show | jobs
by freakynit 19 days ago
This was originally a 400+B param model which was later reduced to 295B considering it as the "optimal zone".

https://www.mdshare.online/s/uend0pj3og_A_rgcxzINf