It was mentioned to be a 20B in a comparison table in a paper co-written by Microsoft, but they've since claimed that it's just an error, and I mean, they'd need to be sitting on some really impressive distilling techniques to shrink a 175B model down to 20B with only a slight drop in performance.
OpenAI have been sitting on GPT4 for months, and on the basemodel even longer. I would not be surprised if they did some or all of the distillation of the model with GPT4.
Mixtral is 56B combined, if we subtract a little for MoE inefficiencies we could say that Mixtral is about 40B combined. This is a 2x increase over 20B. We have seen new models beat others twice the size.
That and a massive amount of excellent data for alignment should produce some great results.
I don't think it's out of the realm of possibility that 20B is real.