If you train an ensemble of models with random dropout, you have an ensemble. Models trained with dropout will still have significant variation from run to run.
It's a common interpretation: https://arxiv.org/abs/1706.06859
In particular, this paper neglected to do the obvious thing: ensemble networks trained with dropout. It improves performance over dropout alone.
It's a common interpretation: https://arxiv.org/abs/1706.06859