| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eric_gu 674 days ago

This is cool! I have not read up on evaluation techniques that use LLM-as-a-Judge, so I hadn't heard of the term "evaluator LLM" before.

Questions that came to mind:

- How are you deciding on the positive/negative concept pairs to generate your latent "evaluation direction?"

- What layer of activations on the evaluator model do you use—the output layer?

- What base model are you using for solving the HaluEval task?

- I notice that LSR on Lynx 8B actually does worse than naive completion, and is in fact worse than LSR on Llama-3-8B-Instruct. Why do you think that is?

1 comments

ruby314 674 days ago

Ty!

- We generate contrast pairs (for this post, using gpt-4o) and do some post processing for quality (synthetic data). The impact of different types of contrast pairs is a continuing area of research for us.

- We treat the evaluator model layers as hyper parameters, similar to the steering research (some of which we cite in our “non-comprehensive list of references”). We also see that the middle layers tend to be most effective.

- For base model, we use both Llama-3-8b-Instruct and Llama-3.1-8b-Instruct to show LSR taking advantage of the improved base model (maybe I misunderstood the question?)

- Re: Lynx being worse with LSR, it depends on data source. It's worse for HaluEval but you can see in the PubMedQA table it’s slightly better there. That’s consistent with the analysis in Contrastive Activation Addition https://arxiv.org/pdf/2312.06681 (section 6, sometimes the impacts of fine tuning and latent space steering are cumulative, sometimes the opposite). Would love to know if anyone has seen research as to why.

link

eric_gu 674 days ago

Thanks for the reply!

Re question #3: I'm not sure I understand why you need to vary the base model or how doing so would allow LSR to take advantage? Isn't your LSR technique used on the activations of the evaluator model?

As a note of feedback, I found the original article a bit hard to understand even with multiple reads. I would have really benefited from a traditional "methodology" section like in an ML paper! The graphs upfront don't make sense to someone who isn't familiar with the problem setting, and even now I'm not sure if the x-axis in the HaluEval Benchmark bar chart refers to the base model or the evaluator model. Maybe it's just me.

link

ruby314 674 days ago

Re #3 - my bad, mixing terminology in my answer above. It’s the “base model” for the evaluator model (vs a fine tuned evaluator model). Just using the labeled Halubench dataset as the outputs to be evaluated, so no base model for the Halueval task.

Thanks for the feedback, really helpful. We may edit for clarity.

link

eric_gu 672 days ago

Ah understood. Makes sense now!

link