Hacker News new | ask | show | jobs
by rasbt 841 days ago
Yes, it's somewhat similar to the 2B model as it uses the same vocabulary size.