Hacker News new | ask | show | jobs
by cloudking 727 days ago
The team behind Takeout genuinely care about making it easy to export and transfer your data out of Google. The problem is that team doesn't control any of the underlying services and resources that Takeout needs to pull data from. Every product at Google is run by a separate team, and essentially is a separate business unit within Google, with their own resources and priorities. Making sure that their products work reliably with Takeout is not a high priority for most teams, integrating is more of a compliance checkbox, and once it's "done" they will move engineering resources onto other features. That's why Takeout can be unreliable.
6 comments

It is a shame it works like this. For any service my first questions is "How easy is it to get out?"
I agree, IMO it's this kind of integration that really allows products from organizations to feel like a complete offering. Some companies do this really well and some don't. This is where company leadership can make a real difference.
If they are, they failed miserably. For once, there is no way to use a download manager.
I can only imagine the office politics going on when the Takeout team has to beg, borrow, and steal access, docs, etc.
What do you think the implications of this are on Google's AI strategy? Not so much this particular data but the structure of Google as isolated units.
I think this problem exists for any feature that needs to span multiple Google products, the organizational structure inherently makes it difficult for these features to be reliable and successful. Regarding the AI strategy, you can see it already causing challenges as each team is integrating Gemini technology into their products separately, instead of being a cohesive top-down vision and strategy. It's why Gemini can't access data from all the Google products, and each product that does support it has it's own integration.

I will caveat that though, I am bullish on Google and AI in general given the incredible talent and vast amount of data Google has access to. I think eventually they will make a technological breakthrough that puts them back in the leader position - they had it with transformers and just didn't know how to turn it into a product.

Google has shown that they lack product vision and a cohesive strategy. The engineering is top notch though, and the GCP hugging face/AI training integration is for real, if they can build community trust GCP will absolutely eat AWSs lunch as AI apps proliferate.

As for a breakthrough to leadership, Demis' approach of using successively more complex video game environments to develop AGI is absolutely the right path, so I wouldn't be surprised if DeepMind generates a prototype "AGI" first, but I would be VERY surprised if goog successfully capitalized on that.

I tried to use GCP for about 10 years and never found it to be robust enough for my business. I enjoyed it far more than AWS
GKE and Cloud Run are very robust at this point, and I consider them best in class. Cloud SQL is fairly mature but I still prefer AWS RDS.
AI is a top priority company-wide at Google right now, so it’s relatively easy to get a ball rolling.
I'd be less concerned about them getting a ball rolling than them getting one ball rolling in one direction. If AI goes anything like most things Google then in short order I expect them to have 2-3 different AI strategies that are all in conflict with each other.
More rolling balls means more wins and more promotions all around!
My issue with Google Takeout is that it is very difficult to download the generated archives on a normal Internet connection. If a download gets interrupted for any reason, I cannot resume the download, I have to re-enter my Google credentials and start again. It can take days to finish downloading a 10GB download, as I constantly monitor the downloads, restart them and re-enter connections. That would take me a couple of hours using BitTorrent, with no manual interventions required. This problem cannot be blamed on other teams within Google.
What connection are you downloading over? It sounds very annoying.
I am almost sure that downloads are resumable using wget. I'll probably check it in the following days
Web browsers' and wget use the same HTTP features for resuming downloads (range-requests) so if it works in one it should work in the other. I've never had enough data in a google takeout that it would fail partway though.
In my experience web browsers don't use range requests for resuming downloads, but just start downloads over. Or at least I don't know how to make them resume.
So what you're saying is, Google's own internal structural incompetence means they cannot be trusted with data of any type.

Huh!

Any organisation of this size with so many different products will have these kind of problems. This is not really a "Google problem".

That's probably a good reason to avoid organisations of these sizes unless you have a good reason not to, but ... that applies to any organisation of comparable size.

I agree to some extent only. Any big organization faces this issue, so you're right this is not just Google problem.

However, they way they approach it varies considerably, and I'd expect Google engineers to address it in a way that minimizes problems resulting from, say, changes in product A causing problems in product B. I'd bet some work on that has already been done because these folks aren't stupid but apparently that's not enough.

bandwagon fallacy