| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rsaha7 803 days ago

Thanks for the feedback!

1. The largest model that we have tested is Llama2 13B. For the first phase, we focussed on fine-tuning LLMs in the 1B-13B range. For our next phase, we will focus on 13B-45B'ish -- for this we will have to incorporate distributed techniques.

2. Following incorporation of distributed training techniques, we will be able to run MoE based models, such as Mixtral.