Hacker News new | ask | show | jobs
by asattarmd 802 days ago
Google having so many private photos in Google Photos must be a goldmine for them.
2 comments

> Google having so many private photos in Google Photos must be a goldmine for them.

While true, it's META who has won that arm's race long ago in my view; hell, they just disclosed that they have private access to DMs to Netflixh [0] in a lawsuit.

If you don;t think they are training their own models on this data over all their platforms you have to be a complete idiot o: Facebook, Instagram, Whatsapp.

That is a much larger treasure trove given the sheer scale of people on those platforms, Google is limited to mainly Android users and those who use it's suite on PC (relatively small compared to social media users), which excludes most Mac users.

The thing they don't tell you about this dark underbelly of AI is just like the (meta)data that is for sale to 3rd parties, it's tiered price structure wherein Mac users are often the premium tier de to their more 'affluent' status and likelihood of impulsive in app purchases.

This is why I think META already won the AI race, they opensource Llama and have the a massive treasure trove of data to refine and train when they see what the OSS community creates that is of actual value: ChatGPT/DALL-e runs at a loss for MS/OpenAI. But if anyone can monetize this gold rush it will be META.

And perhaps more critically from an infrastructure POV, Llamma now runs better on CPU [1] rather than GPU, which means they won't have to be constrained or price pinched on GPUs like Microsoft, Google, Amazon likely will due to demand constraints from Nvidia (see ETH mining craze during COVID). They can focus on optimizing their data centers with more free cash flow which meant they can have a bigger footprint for when they finally figure out how to properly monetize this AI bubble, because it is is a bubble, from now until then.

I think Zuck learned from Libra that staying out of the limelight during a bubble is critical if he wants to undo the Metaverse money-pit/losses.

0: https://www.movieguide.org/news-articles/facebook-allowed-ne...

1: https://news.ycombinator.com/item?id=39890262

> Google is limited to mainly Android users

https://www.appmysite.com/blog/android-vs-ios-mobile-operati...

Random link. Can't vouch for it. But US and RoW have quite different patterns.

> Random link. Can't vouch for it

Seems about right to me, Android dominates the mobile World by sheer numbers.

But what is the value that they can derive from user data? A million Bangladeshi's texts from food delivery is probably a lot less valuable than say a Singaporean using Numbers on Mac OS to layout the next lucrative investment and the data they;d get from the correspondence of say 100 high net worth individuals hidden behind iOS (Pegasus MITM attack notwithstanding).

Again, the name of the game is to derive signal from noise from data, bulk collection is primitive when training models and often incredibly difficult to work around once it is in. I seriously think Gemini had this problem, along with QA/QC issues, rather it going from so-so Bard to total 'woke' Gemini. I may be wrong, but I think this is what happens when you go down the bulk collection and unfiltered/un-curated data route.

> But what is the value that they can derive from user data?

What, are the pictures and videos of people from the global south somehow not good enough to train AI due to their economic situation?

> What, are the pictures and videos of people from the global south somehow not good enough to train AI due to their economic situation?

I don't make the rules, in fact if you are seriously wondering what use 'darker' people's data have had with AI training look no further than the surveillance based platforms that are responsible for tons of false incarcerations of mainly black US citizens [0].

I'm not sure if it's going to change for the plight of the 'Global South's' data either. It's not that I think it's inherently prejudiced, either; it's more like it's optimized to be greedy in order to extract as much value as it possibly can from the current system at all costs.

People need to stop smoking hopium and thinking that this is going to usher some sort of egalitarian renaissance, this is business as usual by the mega corps that bring you this tech.

0: https://innocenceproject.org/artificial-intelligence-is-putt...

Whatsapp chats are encrypted, how can they be used to train the models? Also what kind of training can be done on Instagram data, is there anything of value there?
> Whatsapp chats are encrypted

While they claim E2E encryption, I seriously doubt they would offer this service entirely for free with having some backdoor or potential MITM breach that they likely tucked away in the ToS given the wide use of it it most of the World who pay for SMS/text messages: it just seems so incredibly unlikely to be entirely encrypted from a company that willing gave DMs to Netflix, used Cambridge Analytica etc... But even if it is encrypted, the meta data generated can tell you a lot too--as was the case with Pokemon GO--that may not directly benefit LLMs, but could help with creating dark patterns that make your AI companion (under the guise of an LLM) the 'must own' when deciding who to buy tokens/compute from.

Speculative for sure, but just look at the Twitter file leaks revealing how social media platforms willing work alongside intelligence agencies.

> While they claim E2E encryption, I seriously doubt they would offer this service entirely for free with having some backdoor or potential MITM breach that they likely tucked away in the ToS given the wide use of it it most of the World who pay for SMS/text messages: it just seems so incredibly unlikely

You don't have to trust Metas self-regulation, but you best believe the EU does not fuck around on such issues. Self-preservation is a hell of a motivator.

> Also what kind of training can be done on Instagram data, is there anything of value there?

Billions of comments and private messages; billions of data points on user behavior and (more importantly) how they respond to manipulative UI/UX/content... Nothing useful there??

I'm genuinely curious how does that data help. What would the prompts be like? "Help me design an addictive UX"? How do comments like birthday wishes or people posting their beach pictures and people replying with how good they look add any kind of value to the ML model training? Those conversations would be in larger quantity than any that discuss anything meaningful.
As well as emails, documents, reviews…