Hacker News new | ask | show | jobs
by wobbly_bush 802 days ago
Whatsapp chats are encrypted, how can they be used to train the models? Also what kind of training can be done on Instagram data, is there anything of value there?
2 comments

> Whatsapp chats are encrypted

While they claim E2E encryption, I seriously doubt they would offer this service entirely for free with having some backdoor or potential MITM breach that they likely tucked away in the ToS given the wide use of it it most of the World who pay for SMS/text messages: it just seems so incredibly unlikely to be entirely encrypted from a company that willing gave DMs to Netflix, used Cambridge Analytica etc... But even if it is encrypted, the meta data generated can tell you a lot too--as was the case with Pokemon GO--that may not directly benefit LLMs, but could help with creating dark patterns that make your AI companion (under the guise of an LLM) the 'must own' when deciding who to buy tokens/compute from.

Speculative for sure, but just look at the Twitter file leaks revealing how social media platforms willing work alongside intelligence agencies.

> While they claim E2E encryption, I seriously doubt they would offer this service entirely for free with having some backdoor or potential MITM breach that they likely tucked away in the ToS given the wide use of it it most of the World who pay for SMS/text messages: it just seems so incredibly unlikely

You don't have to trust Metas self-regulation, but you best believe the EU does not fuck around on such issues. Self-preservation is a hell of a motivator.

> Also what kind of training can be done on Instagram data, is there anything of value there?

Billions of comments and private messages; billions of data points on user behavior and (more importantly) how they respond to manipulative UI/UX/content... Nothing useful there??

I'm genuinely curious how does that data help. What would the prompts be like? "Help me design an addictive UX"? How do comments like birthday wishes or people posting their beach pictures and people replying with how good they look add any kind of value to the ML model training? Those conversations would be in larger quantity than any that discuss anything meaningful.