|
|
|
|
|
by ben_w
2786 days ago
|
|
One thing I’ve been tempted to research but never had time for myself: can one use that aspect of wording embeddings to automatically detect and quantify prejudice? For example, if you trained only on the corpus of circia 1950 newspapers, would «“man” - “homosexual” ~= “pervert”» or something similar? I remember from my teenage years (as late as the 90s!) that some UK politicians spoke as if they thought like that. I also wonder what biases it could reveal in me which I am currently unaware of… and how hard it may be to accept the error exists or to improve myself once I do. There’s no way I’m flawless, after all. |
|
If it did, what conclusion would you be able to draw?
As far as I know, there's no theoretical justification for thinking that word vectors are guaranteed to capture meaningful semantic content. Empirically, sometimes they do; other times, the relationships are noise or garbage.
I am wholeheartedly in favor of trying to examine one's own biases, but you shouldn't trust an ad-hoc algorithm to be the arbiter of what those biases are.