I'm assuming this will have similar scores to the original 40B model, in which case LLaMa2 70b would outperform it. The avg score on the Open LLM Leaderboard of LLaMa2 70b instruct is 72.3.
Falcon-40B is 63.4 or 61.5 on the non instruction tuned version.
It's a good observations- there are so many unknowns about all these models. Every day there's a new wizard_uncensored_rhlf_alpaca_tuned_best_one_use_this_13B_4.6-bit_rqm.pth that gets released, it's almost impossible to know the relative merits and which are worth paying attention to.
How true. Every time I browse a model listing there are four word descriptions with little sense of versioning, provenance, hardware requirements, or any reason why I would choose one vs the other.
The architecture is the same I belive, it's just a fine tune so there's nothing special to be done for this version. That said, ggml doesn't support Falcon, but i saw today there is a fork that claims to, though I didn't try it.
Right, I'm being stupid, that's the fork I saw earlier today I didn't realize. Have you tried it? Iirc the documentation mentioned at 2-bit quantizatikn of the 40B model performing well. I've been using a 5-bit 7B llama2 which I'm generally happy with (because it can run in a pretty crappy machine) but interested to see the differences.
Kind of. There's https://github.com/Hannibal046/Awesome-LLM and you can follow the subreddit too. They're not amazing though, so I'm in the process of making my own catalogue.
We ditched most of our focus on Falcon 40B after Llama 2 70B came out, both the tokens per sec and quality of results are not even close.