Hacker News new | ask | show | jobs
by natsucks 987 days ago
Why no multi-turn evaluation? A lot of these benchmarks fail to capture the strength of ghost attention used in Llama 2 chat models.