Hacker News new | ask | show | jobs
by khafra 397 days ago
LLM model interpretability also uses Sparse Autoencoders to find concept representations (https://openai.com/index/extracting-concepts-from-gpt-4/), and, more recently, linear probes.