Hacker News new | ask | show | jobs
by kaibee 590 days ago
cw: i don't actually work in ML, i just read a lot. if someone who is a real expert can tell me if my assessment here is correct, please let me know.

> I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.

That's what an AI model already is.

Let's say you had 10 temperature sensors on a mountain and you logged their data at time T.

If you take the average of those 10 readings, you get a 'wisdom of the crowds' from the temperature sensors, which you can model as an avg + std of your 10 real measurements.

You can then sample 10 new points from the normal distribution defined by that avg + std. Cool for generating new similar data, but it doesn't really tell you anything you didn't already know.

Trying to get 'wisdom of crowds' through repeated querying of the AI model is equivalent to sampling 10 new points at random from your distribution. You'll get values that are like your original distribution of true values (w/ some outliers) but there's probably a better way to get at what you're looking to extract from the model.

1 comments

It's worse than that. LLMs have been tuned carefully to mostly produce output that will be inoffensive in a corporate environment. This isn't an unbiased sampling.
True for consumer products like ChatGPT but there are plenty of models that are not censored. https://huggingface.co/models?sort=trending&search=uncensore...
No. The censoring has already been done systematically by tech corporations at the behest of political agents that have power over them.

You only have to look at opinions about covid policies to realize you won't get a good representation because opinions will be deemed "misinformation" by the powers that are vested in that being the case. Increasingly, criticism of government policy can be conflated with some sort of crime that is absolutely up for interpretation to some government institution so people self censor, companies censor just in casa and the Overton window gets narrower.

LLMs are awesome but they will only represent what they're trained on and what they're trained on only represents what's allowed to be in the mainstream discourse.

> LLMs are awesome but they will only represent what they're trained on and what they're trained on only represents what's allowed to be in the mainstream discourse.

I don't think this is a description of LLM censorship though, especially in light of the fact that many LLMs are fine-tuned for the explicit purpose of censoring responses otherwise generatable by the model, Contrasting uncensored models with censored ones yields objectively uncensored results.

Could be interesting if used with many different llms at once