It also means that if they actually trained with vision, they'd be on par with Anthropic models as vision seems to improve model performance across the board even for non-vision tasks.
Many other open source models have vision but they don't compare to GLM in terms of coding quality. So I don't think it's because of vision that the frontier models are better, it's more that they are probably just much bigger models.