Hacker News new | ask | show | jobs
by higuidebot 490 days ago
> For such an interesting and perhaps important line of work, there seems to be a surprising lack of psychometric rigor in certain corners of the literature.

I agree! That's why I wrote it

> I'm skeptical that this is a reliable analysis

I think it's fair to ask whether the headline ("Claude is More Anxious than GPT") is correct, and it's fair to ask whether distance-to-reference-text-embeddings-across-answers is a good or valid metric for "personality". But it is true that we see the numbers reported in the document for the given input/output pairs, and it makes sense that LLM output distribution would vary between models and, as the paper shows, between model families.

1 comments

Appreciate your response! It makes sense to me that testing LLMs with OCEAN would "work" because OCEAN is rooted in linguistic dimension reduction, but the inference that this reflects an underlying personality (however we want to define that) rather than just being an emergent property of any coherent language model seems like a bridge too far. Whether the phenomenon has real psychological significance is the interesting question that I wish got more attention in general.