Hacker News new | ask | show | jobs
by Ironlink 3321 days ago
First of all I think they are too smart to jeopardize their entire cloud service in that way; no one would buy that. Ignoring that, I would think inspecting cloud customer data is much too unreliable to be of any use to them.

Before anything else, the format and schema of the customer data would have to be analyzed and converted to data structures that match Google's internal models. While I imagine a computer could do it, I certainly would want a human to verify that the analysis makes sense.

Assuming this has been done, they are then at the mercy of the customer as far as whether the data is accurate, whether it is complete, how often it is updated, etc.

At the end of the day, I don't see how they would build a reliable business around arbitrary data structures which they have no control over. Information you can't trust is pretty useless.

Edit: They would also have to understand how the data was selected. Looking at a series of data points, you would wonder if all these are from Arizona, or all from the year 1976, or all from color blind individuals. Without understanding such limitations, making any sort of deduction from a dataset will just lead you the wrong way.

1 comments

Ok, so in general I agree with your statements, but...

> how they would build a reliable business around arbitrary data structures which they have no control over

Google Search? The entire web could be described exactly like that.

You make a great point. To me, Google Search is a bit of a special case in that the data it provides has been posted publicly. Unless the Google Cloud ToS give them the right to publicize your business data, that's out of the question.

Another point here is that Google Search isn't an authoritative source of information, it is up to the end user to inspect the returned links and decide if they can trust that site. This is something that I would not try to automate to the point that I could ask users for money in exchange, and if it can't be automated it doesn't seem like a great fit for Google.

The question is if they can extract value out of the customers' data that's on or passes through the servers and services that they manage, or if it's too messy to be of use. And my answer to that is that this is Google's primary business model, so surely they could if they wanted to try. Whether they do (surely not) or should (definitely not) are different questions.
But at least webpages are in a standardized format. With the exception of images, some of the data might be in arbitrary formats or just not trivial.
> standardized format†

†: Parses unreliably at best, Turing-complete at worst. (aka javascript if you didn't catch that)

Webpages are, but the information isn’t. Multiple ways to layout your page; multiple ways to express your information; multiple languages; etc.
> Google Search? The entire web could be described exactly like that.

Nah, the user's mood may be ruined if the search results are junk, but ruining the the model you're trying to build has vastly more costly consequences. I don't think the tech is there (just yet) to have some code simply ingest whatever comes its way, chew it up and use it well; required xkcd (today's!): https://xkcd.com/1838/