Hacker News new | ask | show | jobs
by mtlmtlmtlmtl 1503 days ago
Don't worry, I'm sure they have some nefarious plans down the road. They're just being "open" to corner the market first.
2 comments

> to corner the market first

Is Meta's model going to be open source or paid?

The linked paper makes it clear it will be released under a non-commercial license. You will download it gratis (so it won't be paid), but it won't be open source.
So they make a more available alternative, but they maintain control over it, and in turn gain control over the people and companies using it. Similar to what Microsoft did by bundling Windows with PCs[1].

I already have a multitude of ideas on potential nefarious plans based on this, but I'll keep them to myself.

[1]: Sure they got a licence payment, but since it was built into the price and non-optional, it was effectively equivalent to free from the customer POV. It effectively became a tax. I have to admit, Gates might not be a genius programmer but he sure knows how to design dark patterns :)

My guess is that they've "fingerprinted" the model sufficiently that they can identify content that has been created with it.
What are you talking about?
It's pretty simple. GPT models are essentially information weapons. People are going to get their hands on them, so might as well give them a model where you can identify content generated with them, so you can know who is using them for nefarious purposes. Like how many printers encode hidden patterns on paper that identify the model of the printer and other information[0]

0. https://www.bbc.com/future/article/20170607-why-printers-add...

This is nonsense.
Would an AI @ FB employee admit it if it was true?
> I will never discuss FB technical details, internals, or anything else on this site, so please do not ask.

My claim of nonsense has nothing to do with FB. You cannot fingerprint models like this, that's just not how it works.

Also, if we are reading profiles, you call yourself a 10x engineer on your blog, that's hilarious. Maybe 10x the nonsense?

How can you identify content generated with them?
I'm not saying that Meta did it, but recent research shows that it is possible and hard to detect - https://arxiv.org/abs/2204.06974 - so if they really wanted to, they could.
That paper is not about fingerprinting the arbitrary output of a specific model, which would allow Meta to track its usage in the results, e.g. tell a genuine text from a fake generated by their model. The paper implies giving the model some specific secret input only known to you.

I think the thread we're in is also based on the similar misunderstanding.

By training a GAN. A trained GAN will be able to accurately guess whether a block of text was produced by this GPT model, some other GPT model, or is authentic.
Just so I understand you properly:

Original Inputs (A) -> NN (Q) -> Output (X)

You are saying you could train something that would take X and identify that it is the product of NN (Q). Even though you don't know A?

So, to simplify and highlight the absurdity: If I made a NN that would complete sentences by putting a full stop on the end of open sentences. You could train something that could detect that separately to a human placed full stop?

(This seems actually impossible, there is an information loss that occurs that can't be recovered)

If differentiating between real samples and generated ones were as straightforward as "training a GAN", detecting deep fakes would not be as big of a research topic as it is.
Know any papers where someone has done this with large language models successfully?