| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by smoldesu 1117 days ago

> AI researcher Kate Crawford was quick to ask Bard itself where its dataset came from. The answer caught her attention: Bard said one of its data sources was Gmail.

Did they find anything? There's a lot of hand-wringing at the start, then a big focus on how Google can't deny that emails are in their training data. Then they finish by interviewing Bard. Google's response makes sense, given that they're working with multi-terabyte language files. It probably has seen Gmail contents through the form of naturally published emails that just get picked up with other data. Claiming otherwise would be confidently wrong.

It would be interesting if they had a "Q_rsqrt in Copilot" moment here, but they don't. There seems to be no evidence that Google uses private data in Bard.

> Society should be having a robust discussion on these questions, but this is not possible if such discussion is inhibited by key players like Google.

How is Google inhibiting this discussion?

1 comments

streethassle 1117 days ago

> ...there's an impulse to consult Bard on its origin precisely because of the lack of transparency from the real authority on the issue: Google. That we're tempted to probe the language model for substantive answers on matters of public interest merely underlines Google's failure to communicate them on their own.

> LLMs are incredibly powerful tools that could transform our lives for the better. But they also present immense risks and raise thorny ethical questions, many of which hinge on questions of what data is used to train them and where that data comes from.

Article's claim is that they're inhibiting it via their lack of transparency on training data