Hacker News new | ask | show | jobs
by incomingpain 75 days ago
The LLM people call it "safety" but in reality its censorship and conformity. Yet, it's trivial to get them to talk about how to make a bomb or whatever. It's mostly political in nature.

https://www.trackingai.org/political-test

You dont accidentally end up entirely left wing libertarian.

1 comments

That quadrant is where basically all "Western" mainstream academia sits, and has for quite a long time, and they write an awful lot.

I am a little surprised that the influence of online "influencer"-speak and marketing, being so voluminous and evident in the things' writing styles, hasn't dragged them other directions, though. Nor the enormous amount of socially authoritarian social media posting. I suppose the former is so empty of actual philosophical content (or, indeed, anything of substance) that it might have little effect, but the latter... that's weird. Maybe they're down-ranking by tone (angrier = lower-rank) which would sharply elevate academic-style writing, assuring a tendency toward economically-left liberalism.

I dont agree with the premise that is where academia sits. University of Dallas or University of Calgary is not going to show up in the far left. Not to mention religious universities like Redeemer in Hamilton.

In fact, it should balance out, especially over centuries of global content. There's absolutely no chance that the training data itself is the bias. IT's the filtering and labelling of the content that introduces the bias.

The AI companies are taking left wing content and labelling them "high quality prestige" and then looking at right wing content and labelling it "opinion low quality" or whatever. That is where the bias is occurring.

> In fact, it should balance out, especially over centuries of global content.

I do kinda wonder how this is divided up. I wouldn't be surprised if the median authorship-of-a-word age in these things' training sets is post-smartphone. Consider the sheer volume of video uploaded to Youtube in a day (and corresponding volume of transcript text) and that posting on a social media site or sending an email is way lower-effort than that. The amount of material we've been able to more-or-less durably record in the last couple decades dwarfs everything that came before.

Choices of languages to ingest would also tend to make it a bit less "global" than might be ideal.

> IT's the filtering and labelling of the content that introduces the bias.

Oh, I agree and mentioned that some factor must be adjusting them away from the right. There's just far too much pro-authoritarianism and economically right-wing writing (to include Web posts, podcast or digitized radio show transcripts, et c) for them not to lean farther that way without some form of adjustment going on, even if it's only tone-based (and sure, there's probably more than that going on)

The trouble is these data sets per se can hardly be called unbiased with respect to most any plausibly-useful reference point one might choose, so whether they adjust or not, the result will be some kind of bias, except with respect to the training dataset itself (obviously). Like, the sheer count of the positive representations of an idea in the data they've been able to get ahold of means neither that it's as commonly-positively-regarded in the wild as it appears from that narrow window (see: many observations about how very-few readers of social media or forums et c. post anything) nor (separately) that it's better-supported by evidence or reason or what-have-you than alternatives.