|
|
|
|
|
by rsaha7
803 days ago
|
|
Thanks for the feedback! 1. The largest model that we have tested is Llama2 13B. For the first phase, we focussed on fine-tuning LLMs in the 1B-13B range. For our next phase, we will focus on 13B-45B'ish -- for this we will have to incorporate distributed techniques. 2. Following incorporation of distributed training techniques, we will be able to run MoE based models, such as Mixtral. |
|