Hacker News new | ask | show | jobs
by yeck 1039 days ago
If you could know that this is the case with interpretability tools than we would be able to train new models with purposeful decisions to reduce or remove hallucinations. Narrow the range of the tests and experiments you need to do to solve the problem. Otherwise we are mostly speculating about why stuff doesn't work and play a game of darts in the dark.