It just goes in cycles of being better and then being worse again, presumably based on how much Anthropic are having to optimise inference