| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sfaist 335 days ago
	The reason we think this would be interesting to share here is that these llm benchmarks seem increasingly disconnected from reality. idc if the llm can solve a PhD math question or make scientific discoveries, I care if it can solve our problems, which in our case is automating API integrations. Turns out it mostly can't, which tracks well with our experience using cursor.