Hacker News new | ask | show | jobs
by krackers 182 days ago
Papers on mechanistic interpratability and representation engineering, e.g. from Anthropic would be a good start.