The onus isn’t on me. It’s on anyone contradicting findings by most benchmarks, because most of them show a clear advantage for Opus and GPT over OSS models.
What's amazing is that LLM technologies are so immature that even basic engineering diligence isn't being done. (Like detecting token loops, for example.)