The only ones I am aware of is benchmarks on Twitter, Chatbot Arena [1] and Aider benchmark [2]
1. https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leade...
2. https://aider.chat/docs/leaderboards