| I would note the actual leading models right now (IMO) are: - Miqu 70B (General Chat) - Deepseed 33B (Coding) - Yi 34B (for chat over 32K context) And of course, there are finetunes of all these. And there are some others in the 34B-70B range I have not tried (and some I have tried, like Qwen, which I was not impressed with). Point being that Llama 70B, Mixtral and Grok as seen in the charts are not what I would call SOTA (though mixtral is excellent for the batch size 1 speed) |
So it's fair to say that DBRX is the leading general purpose model that can be used commercially.