Hacker News new | ask | show | jobs
by jstummbillig 771 days ago
A llm is biased by design, the "open models" are no different here. OpenAI will, like any other model designer, pick and chose whatever data they want in their model and strike deals to that end.

The only question is in how far this is can be viewed as ads. Here I would find a strong backslash slightly ironic, since a lot of people have called the non-consensual incorporation of openly available data problematic; this is an obvious alternative option, that lures with the added benefit of deep integration over simply paying. A "true partnership", at face value. Smart.

If however this actually qualifies as ads (as in: unfair prioritisation that has nothing to do with the quality of the data and simply people paying money for priority placement) there is transparency laws in most jurisdictions for that already and I don't see why OpenAI would not honor them, like any other corp does.

2 comments

> A llm is biased by design

Everything is biased. The problem is when that bias is hidden and likely to be material to your use case. These leaked deals definitely qualify as both hidden and likely to be material to most use cases whereas more random human biases or biases inherent in accessible data may not.

> non-consensual incorporation of openly available data problematic; this is an obvious alternative option

A problematic alternative to an alleged injustice just moves the problem, it’s not a true resolution.

> there is transparency laws in most jurisdictions for that already and I don't see why OpenAI would not honour them

Hostile compliance is unfortunately a reality so this ought to give little comfort.

> These leaked deals definitely qualify as both hidden and likely to be material to most use cases whereas more random human biases or biases inherent in accessible data may not.

a) Yes, leaked information definitely qualifies as hidden, that is, prior to the most likely illegal leak (which we apparently do not find objectionable, because, hey, it's the good type of breach of contract?)

b) Anyone who strikes deals understands there is a situation where things are being discussed, that would probably not okay to be implemented in that way. Hence, the pre-sign discussion phase of the deal. Somewhat like one could have some weird ideas about a piece of code, that will not be implemented. Ah-HA!-ing everything that was at some point on the table is a bit silly.

> A problematic alternative to an alleged injustice just moves the problem, it’s not a true resolution.

The one characteristic I found that sets the people that are good to work with apart is understanding the need for a better solution, over those who (correctly but inconsequentially) declare everything to be problematic and think that to be some kind of interesting insight. It's not. Everything is really bad.

Offer something slightly less bad, and we are on our way.

> Hostile compliance is unfortunately a reality so this ought to give little comfort.

Yes, people will break the law. They are found out, eventually, or the law is found out to be bad and will be improved. No, not in 100% of the cases. But doubting this general concept that our societies rely upon whenever it serves an argument is so very lame.

> A llm is biased by design

I don’t think some bias is inherently in models is in any way comparable to a pay to play marketing angle

I reject the framing.

We can't have it both ways. If we want model makers to license content they will pick and chose a) the licensing model and b) their partners, in a way, that they think makes a superior model. This will always be an exclusive process.

I think we need to separate licensing and promotion. They have wildly different outcomes. Licensing is cool, it's part of the recipe. Promoting something above its legitimate weight is akin to collusion or buying up amazon reviews without earning them.
That's just pushes up the cost of licensing.
Not if the pie grows bigger.
We don't want it both ways - if that's the price we'd have to pay, at least I definitely don't want model makers to license content.
It's a question of axioms. LLMs are by definition "biased" in their weights; training is biasing. Now the stated goal of biasing these models is towards "truth", but we all know that's really biasing towards "looking like the training set" (tl;dr, no not verbatim). And who's to say the advertising industry-blessed training material is not the highest standard of truth? :)
> And who's to say the advertising industry-blessed training material is not the highest standard of truth? :)

Anyone who understands what perverse incentives are, that’s who. Or are you just playing the relativism card?