Hacker News new | ask | show | jobs
by rsaha7 803 days ago
Thanks for the feedback!

1. The largest model that we have tested is Llama2 13B. For the first phase, we focussed on fine-tuning LLMs in the 1B-13B range. For our next phase, we will focus on 13B-45B'ish -- for this we will have to incorporate distributed techniques.

2. Following incorporation of distributed training techniques, we will be able to run MoE based models, such as Mixtral.