Given the config parametes posted, its 2 experts per token, so the conputation cost per token should be the cost of the conponent that selects experts + 2× cost of a 7B model.
Ah good catch. Upon even closer examination, the attention layer (~2B params) is shared across experts. So in theory you would need 2B for the attention head + 5B for each of two experts in RAM.
That's a total of 12B, meaning this should be able to be run on the same hardware as 13B models with some loading time between generations.
It will run with the speed of a 7B model while being much smarter but requiring ~24GB of RAM instead of ~4GB (in 4bit).