Hacker News new | ask | show | jobs
by ca_tech 949 days ago
They do mention that their expectation is that the 70B model will provide even better performance. I expect that you are correct and that they determined the 13B was capable enough to serve as a base model. Why incur additional training time before getting preliminary results.