Hacker News new | ask | show | jobs
by raghavgupta0296 802 days ago
Sounds like a great library to use for automatically testing all the new models being released everyday and finding out if a new open source model significantly performs better on your custom dataset.

1. What's the largest model (number of parameters) that you've tested the library with?

2. Will MoE models work as well? They're known to have more unstable training and need some custom techniques to stabilize

1 comments

Thanks for the feedback!

1. The largest model that we have tested is Llama2 13B. For the first phase, we focussed on fine-tuning LLMs in the 1B-13B range. For our next phase, we will focus on 13B-45B'ish -- for this we will have to incorporate distributed techniques.

2. Following incorporation of distributed training techniques, we will be able to run MoE based models, such as Mixtral.