Hacker News new | ask | show | jobs
by ttrrooppeerr 1108 days ago
The discussion here is not if it was trained or not in Gmail data, but on personal Gmail data. And that the definition of personal is vague and subject to interpretations.

They can say it was not trained on Gmail data because it was trained with Google’s Smart Compose, which it was trained on Gmail data.

Gmail Data -> Google’s Smart Compose -> Bard

It all depends on where you draw the line to stop reporting. Language is a powerful tool of deception.

1 comments

I'd be shocked if they came anywhere near email data for Bard training. Why do they need that with all the reputational baggage that comes with it, they have only, like, the rest of the Internet at their disposal?
Clean data. A bunch of data points that are in a good enough state / structured to just throw into the training / eval makes a bigger difference than a bazillion messy data points.

I can easily imagine people in charge with the mentality of "there's no way that anyone can prove we did it."

It's very improbable, but looking at the "AI integration / product" race it is still a non-zero chance it could have happened.