|
|
|
|
|
by ruby314
674 days ago
|
|
Ty! - We generate contrast pairs (for this post, using gpt-4o) and do some post processing for quality (synthetic data). The impact of different types of contrast pairs is a continuing area of research for us. - We treat the evaluator model layers as hyper parameters, similar to the steering research (some of which we cite in our “non-comprehensive list of references”). We also see that the middle layers tend to be most effective. - For base model, we use both Llama-3-8b-Instruct and Llama-3.1-8b-Instruct to show LSR taking advantage of the improved base model (maybe I misunderstood the question?) - Re: Lynx being worse with LSR, it depends on data source. It's worse for HaluEval but you can see in the PubMedQA table it’s slightly better there. That’s consistent with the analysis in Contrastive Activation Addition https://arxiv.org/pdf/2312.06681 (section 6, sometimes the impacts of fine tuning and latent space steering are cumulative, sometimes the opposite). Would love to know if anyone has seen research as to why. |
|
Re question #3: I'm not sure I understand why you need to vary the base model or how doing so would allow LSR to take advantage? Isn't your LSR technique used on the activations of the evaluator model?
As a note of feedback, I found the original article a bit hard to understand even with multiple reads. I would have really benefited from a traditional "methodology" section like in an ML paper! The graphs upfront don't make sense to someone who isn't familiar with the problem setting, and even now I'm not sure if the x-axis in the HaluEval Benchmark bar chart refers to the base model or the evaluator model. Maybe it's just me.