| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gommm 388 days ago
	This looks like LLM blog spam. To be taken seriously, they'd need to publish the implementation in each language of each benchmark which they didn't. Instead they show pseudocode with very vague descriptions of failure mode that do not really make sense: "Under our error cascade simulation, some low-level failures in unsafe code regions propagated in ways that eventually caused deadlocks in resource management." That doesn't give any details nor does it sound like a realistic failure case to have "failures in unsafe code regions".