|
|
|
|
|
by albertan017
822 days ago
|
|
Yes, it's not easy to train a 33B model. An interesting point is, naive fine-tuning, which means if one followed the standard way to fine-tune the model. Training a larger model is tricky, not only the data amount matters, everything like data cleaning, learning rate, and decays will affect the final performance. |
|