| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by emp17344 6 days ago
	Seems like it literally popped up yesterday with the express purpose of building hype for this release.

4 comments

osti 6 days ago

And notable absence of DeepSWE benchmark where they do badly, but somehow a benchmark that was published yesterday is in this announcement.

link

zzleeper 6 days ago

Exactly.. a bit of a red flag for me..

link

swyx 6 days ago

team member here - we had been working on frontiercode for ~6-7months. timing just lined up

link

emp17344 6 days ago

Yeah, right. If this benchmark was truly developed in an independent manner, and the timing just “lined up”, how did Anthropic even know to include results in their model release documentation the day after the benchmark is revealed? It seems like there must have been some collaboration or influence from Anthropic behind the scenes.

link

oblio 6 days ago

Come on, why are you a jerk about this?

Nobody would have 800+ billion reasons to lie by commission or omission here.

link

vanuatu 6 days ago

i doubt it, cog wants coding agents to be better because it directly improves their product

they aren't married to a particular lab, most of their usage is their in house model i believe

link

anthonypasq 6 days ago

what incentive does Cognition have for doing this? seems like complete nonsense speculation on your part.

link

bel8 6 days ago

With billions/trillions of dollars floating around, is it hard to imagine benchmarks could be biased?

I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.

link

camdenreslink 6 days ago

People game benchmarks for fake internet points to get their favorite web framework to the top of the list. I'm pretty sure they will do it for billions of dollars.

link

anthonypasq 6 days ago

you didnt answer my question. Why would cognition be biased towards making anthropic look good?

link

gloosx 6 days ago

Because Cognition is a major customer of Anthropic?

link

anthonypasq 5 days ago

they are also a major customer of OpenAI and every other model maker. whats your point?

link