| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by natsucks 987 days ago
	Why no multi-turn evaluation? A lot of these benchmarks fail to capture the strength of ghost attention used in Llama 2 chat models.