Y
Hacker News
new
|
ask
|
show
|
jobs
by
natsucks
987 days ago
Why no multi-turn evaluation? A lot of these benchmarks fail to capture the strength of ghost attention used in Llama 2 chat models.