Hacker News new | ask | show | jobs
by ruby314 675 days ago
Re #3 - my bad, mixing terminology in my answer above. It’s the “base model” for the evaluator model (vs a fine tuned evaluator model). Just using the labeled Halubench dataset as the outputs to be evaluated, so no base model for the Halueval task.

Thanks for the feedback, really helpful. We may edit for clarity.

1 comments

Ah understood. Makes sense now!