|
|
|
|
|
by marci
763 days ago
|
|
I understand, I'm just glad for the possible implications for future models: less expensive to make => less expensive to iterate. MoE are cheaper to train. My favorite right now is Wizard 8x22b, so as a random user, I don't really care about this model. Will probably never run it as-is. But makes me hope for a Falcon-MoE. Also, the fact that it's less dense than llama 3 means there may be more room for lora fine-tuning, and at a lesser cost than required for llama 3 while sacrificing way less of its smarts. That may be my use. |
|