| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kevc 214 days ago
	It feels like we are pretty far away from LLMs running a concession stand (see andon labs) so not surprised it would struggle here. Still the failure modes are super interesting and having benchmarks seems to be the starting point to domain-specific improvements.