|
|
|
|
|
by sanxiyn
674 days ago
|
|
I didn't mean to suggest it as a competition to the method presented (LSR: latent space readout). It is old after all. LSR's use in evaluator LLM and working with small samples (because it works with linear direction) does seem novel and useful to me. An advantage of being aware of early papers is that it accumulates citations so you can often find good works in reverse citations. I had a brief look and the following seems interesting: GRATH: Gradual Self-Truthifying for Large Language Models https://arxiv.org/abs/2401.12292 TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space https://arxiv.org/abs/2402.17811 In contrast, Lynx may be a better model and HaluBench may be a better benchmark, but the paper is too new so it has zero reverse citations on Google Scholar at the moment. Interestingly, Lynx paper does cite Azaria 2023 although in a very cursory way. |
|
Thanks again for sharing interesting references. Cool that GRATH uses contrast pairs in an iterative process with DPO and TruthX is steering (using the term broadly) with a creating architecture to determine the inference time edits.
One thing about Lynx and HaluBench - as we understand it, Halubench is the test set for Lynx's training data. They do have a couple of held out data sources besides the four they train with, but as far as we could tell from their paper they use the same hallucination-inducing function. Be curious to hear your thoughts on that.