|
|
|
|
|
by raghavgupta0296
802 days ago
|
|
Sounds like a great library to use for automatically testing all the new models being released everyday and finding out if a new open source model significantly performs better on your custom dataset. 1. What's the largest model (number of parameters) that you've tested the library with? 2. Will MoE models work as well? They're known to have more unstable training and need some custom techniques to stabilize |
|
1. The largest model that we have tested is Llama2 13B. For the first phase, we focussed on fine-tuning LLMs in the 1B-13B range. For our next phase, we will focus on 13B-45B'ish -- for this we will have to incorporate distributed techniques.
2. Following incorporation of distributed training techniques, we will be able to run MoE based models, such as Mixtral.