Hacker News new | ask | show | jobs
by bwillard 2886 days ago
Howdy, (I work on DTP)

I wanted to provide my thinking on some of these very valid wories,

Re: Copy vs. Move: This was a conscious choice that I think has a solid backing in two things: 1) In our user research for Takeout, the majority of users who user Takeout don't do it to leave Google. We suspect that the same will be true for DTP, users will want to try out a new service, or user a complementary service, instead of a replacement. 2) Users should absolutely be able to delete their data once they copy it. However we think that separating the two is better for the user. For instance you want to make sure the user has a chance to verify the fidelity of the data at the destination. It would be terrible if a user ported their photos to a new provider and the new provider down-sampled them and originals were automatically deleted.

Re: Scraping Its true that DTP can use API of companies that are 'participating' in DTP. But we don't do it by scraping their UIs. We do it like any other app developer, asking for an API key, which that service is free to decline to give. One of the foundational principals we cover in the white paper is that the source service maintain control over who, how, and and when to give the data out via their API. So if they aren't interesteed in their data being used via DTP, that is absolutely their choice.

Re: Economics As with all future looking statements we'll have to wait and see how it works out. But I'll give one antidote on why I don't think this will happen. Google Takeout (which I also work on) allows users to export their data to OneDrive, DropBox, and Box (as well as Google Drive). One of the reasons we wanted to make DTP is we were tired of dealing with other peoples APIs, as it doesn't scale well. Google should build adapters for Google, and Microsoft should build adapters for Microsoft. So with Takeout we tried the specialized transport method, but it was a lot of work, so we went with the DTP approach specifically to try to avoid having specialized transports.

DTP is still in the early phases, and I would encourage you, and everyone else, to get involved in the project (https://github.com/google/data-transfer-project) and help shape the direction of the project.

2 comments

Hey! Thanks for the response. If you don't mind, I have some questions and comments after reading through your feedback.

> We suspect that [the majority of users who use Takeout don't do it to leave Google] will be true for DTP, users will want to try out a new service, or user a complementary service, instead of a replacement.

Interesting, thanks. I think this sort of worldview makes sense from a certain perspective.

> 2) Users should absolutely be able to delete their data once they copy it.

This is an aspirational statement and not a requirement of DTP, so it's problematic from a public perception standpoint to make the claim that DTP provides the user with more control of their data when the control very much remains at the mercy of the data controller. Indeed, this project directly facilitates the opportunity for more data controllers to obtain copies of the subject's data.

> If they aren't interested in their data being used via DTP, that is absolutely their choice.

Can you clarify whether you are saying that the DTP Project will honor takedown requests from parties targeted by DTP tooling?

> Google should build adapters for Google, and Microsoft should build adapters for Microsoft.

Can you explain the business drivers that incentivize these companies to provide parity between their import and export capabilities? Does the DTP Project require parity between these capabilities?

>This is an aspirational statement and not a requirement of DTP, so it's problematic from a public perception standpoint to make the claim that DTP provides the user with more control of their data when the control very much remains at the mercy of the data controller. Indeed, this project directly facilitates the opportunity for more data controllers to obtain copies of the subject's data.

I don't really disagree with what you, but I interpret things differently:

Without DTP, if you ask a data controller to delete your data you have to trust that they do. There is very little way to verify that the deletion actually happened, you more or less need to rely on the reputation of the company. Nowadays they all should have published retention statements which state their deletion practices in more details, so that helps some, and allow for some recourse if in fact they aren't following it. But in general for the average user, it comes down mostly to trust.

With DTP, nothing is worse. But users now can get their data into a new service easier.

If DTP had move semantics you still have the same problem as above, it mostly comes down to trust.

It is true that after a copy there are now two copies of the data, which isn't ideal in terms of data minimization. But because of the reasons I outline previously, I think it is important to keep deletion as a separate action from copy. I do think that after a copy the option to delete the data should be presented to the user prominently to make that as easy as possible if that is what they want to do.

So DTP isn't trying to solve every problem, but my take is that it makes some things better without making anything else significantly worse, so it's a net win.

> Can you clarify whether you are saying that the DTP Project will honor takedown requests from parties targeted by DTP tooling?

DTP doesn't really store data, so I don't think it is scope for a traditional takedown request. But I think more to the spirit of the question, yes if a service doesn't want to grant a DTP host a API key, or revoked an API, we wouldn't condone trying to work around that.

(One super detailed note, DTP is just an open source project, and doesn't operate any production code. A Hosting Entity can download/run the code. A Hosting Entity could be a company letting users transfer data in or out, or a user running DTP locally. Each Hosting Entity is responsible for acquiring API keys for all the services they want to interact with; including agreeing to and complying with any restrictions that that service might impose for access to their API.)

> Can you explain the business drivers that incentivize these companies to provide parity between their import and export capabilities? Does the DTP Project require parity between these capabilities?

This is a little bit of a bet on our part. I think Google has demonstrated, through its almost decade long investment in Takeout, that giving users more control over their data leads to greater user trust and that is good for business.

As for requiring parity, we cover this a bit in the white paper, but as you say, we recognize the reciprocity is key, and we need to incentive services to invest equally in import and export otherwise the whole thing falls apart.

Right now the stance we are taking is the reciprocity is strongly encouraged and we will be collection stats/metrics to try to measure it so we can name and shame folks that aren't following that. We hope that by providing transparency around different service's practices in this area will allow users to make informed decisions about where to store their data.

An interesting thought experiment in this area is that if a user wants to transfer data from service A to service B, but service B doesn't allow export back out, what should service A do? Ideally you force service B to support export, but on the other hand the user should be in control, and who is service A to say no. Its almost putting the good of an individual user against the good of the ecosystem.

We are hoping that as the project, and the large portability ecosystem, evolves there emerges some kind of neutral governance model that can help mediate some of these issues. It is problematic for service A to decide that question, but a neutral group representing the interests of users will have more legitimacy in making these tough questions.

Thanks for taking the time to provide these detailed follow ups. I'm still pretty wary of this project, but you've demonstrated that at least one person on the team is thinking through some of this stuff.

> An interesting thought experiment in this area is that if a user wants to transfer data from service A to service B, but service B doesn't allow export back out, what should service A do? Ideally you force service B to support export, but on the other hand the user should be in control, and who is service A to say no. Its almost putting the good of an individual user against the good of the ecosystem.

I'll offer that the European Union's answer to this -- the GDPR -- is to put the data subject first. It would be nice to see the DTP Project align with that position.

Please define 'delete' in this context. I'm afraid that if I transfer my data, the original will never be deleted.

Now I've doubled my problem.

In this context, "delete" should probably be understood to mean "removed from production systems, and retained only to the extent required to meet legal obligations".