Hacker News new | ask | show | jobs
by ashdksnndck 369 days ago
There is already research in the literature showing that LLMs have neurons that model the gender [1], personality [2], ideology [3], and historic era [4] of the author. There’s also evidence that they model the distinction between the beliefs of the author and other characters, which has been summarized as “theory of mind” [5]. And we have only scratched the surface, with most research using small open-weight models that lag behind frontier model capabilities.

[1] Z. Yu & S. Ananiadou, “Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing,” arXiv:2501.14457 (2025).

[2] J. Deng et al., “Neuron-based Personality Trait Induction in Large Language Models,” arXiv:2410.12327 (2024).

[3] J. Kim, J. Evans & A. Schein, “Linear Representations of Political Perspective Emerge in Large Language Models,” arXiv:2503.02080 (2025).

[4] W. Gurnee & M. Tegmark, “Language Models Represent Space and Time,” arXiv:2310.02207 (2023).

[5] C. Hardy, “A Sparse ToM Circuit in Gemma-2-2B,” https://xtian.ai/pages/document.pdf