| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by breadsniffer01 1018 days ago
	They could have easily benchmarked with the Spider SQL test set but they didn’t. I have a feeling that the more robust models might be the ones that don’t perform best on benchmarks right away.