In testing for my workflows copilot significantly underperforms the SOTA agents, even when using the exact same models. It's not particularly close either.
This has lead to 2 classes of devs at my company a) AI hesitant, who for many copilot is their only interaction, having their worst fears confirmed about how bad AI is. b) AI enthusiasts who are irritated by dealing with management that don't know the difference pushing back on their asks for access to SOTA agents.
If I were the frontier labs, and wasn't billions of dollars beholden to Microsoft, I'd cut Copilot off. It poisons the well for adoption of their other systems. I don't deal with the other copilots besides the coding agent variants but I hear similar things about the business application variants.
Microsofts AI reputation is in the toilet right now, I'm not sure if its understood how bad it really is within the org.
Interesting - these head to head comparisons you’re doing with the same model - what harnesses are you comparing, say Claude code / codex versus copilot cli?
> I'm not sure if its understood how bad it really is within the org.
I can’t speak to that, but there’s a lively culture of people using internal tooling who also extensively use 3p products on projects outside work and are in a reasonable position to assess how well GH copilot works.
Yeah, I’m only interested in cli and non-interactive agent usage. I don’t compare say the vs code plugins, but do regularly compare say GitHub code reviews.
Those comparisons for instance have made us turn _off_ copilot pull requests entirely. All of the agents have false positives (as do humans) but copilot was having negative value in that context.
I’ve only started using it, so maybe I’m holding it wrong, but the other day I asked the IntelliJ plugin to explained two lines of code by referencing the line numbers. It printed & explained two entirely different lines in a different part of the file. I asked again. It picked two lines somewhere else.
After using ChatGPT for the last 6 months or so, Copilot feels like a significant downgrade. On the other hand, it did easily diagnose a build failure I was having, so it’s not useless, just not as helpful.
Sure I love Claude Code too - I use it plenty outside of work. But funnily enough I’ve been asking myself about whether to get my org on board with internal Claude Code trials and was struggling to truly articulate what we were losing versus the Copilot cli. There are some feature gaps - but the pace of work is super and experience is pretty good for me.
No one should hit Microsoft over the head for giving people access to Claude code - choice and competition is good!
This has lead to 2 classes of devs at my company a) AI hesitant, who for many copilot is their only interaction, having their worst fears confirmed about how bad AI is. b) AI enthusiasts who are irritated by dealing with management that don't know the difference pushing back on their asks for access to SOTA agents.
If I were the frontier labs, and wasn't billions of dollars beholden to Microsoft, I'd cut Copilot off. It poisons the well for adoption of their other systems. I don't deal with the other copilots besides the coding agent variants but I hear similar things about the business application variants.
Microsofts AI reputation is in the toilet right now, I'm not sure if its understood how bad it really is within the org.