|
This website makes it seem like this “public” dataset is for the community to use, but it is instead a for-profit money maker for Google Cloud and you can lose tens of thousands of dollars. Last week I ran a script on BigQuery for historical HTTP Archive data and was billed $14,000 by Google Cloud with zero warning whatsoever, and they won’t remove the fee. This official website should be updated to warn people Google is apparently now hosting this dataset to make money. I don’t think that was the original mission, but that’s what it is today, there’s basically zero customer support, and you can lose $14k in the blink of an eye. Academics, especially grad students, need to be aware of this before they give a credit card number to Google. In fact, I’d caution against using this dataset whatsoever with this new business model attached. |
What it is, roughly, is a publicly-accessible data supercomputer. If you lost $14k in a blink of the eye, then I would think you consumed at least $4k of Google's actual resources -- maybe $7k. Maybe more. That thing can move some serious data, and you apparently moved around over 2PB.
Google bears some significant responsibility for not making the cost transparent to you, it's true. But on the the other hand, don't they bear some significant credit for making such an awesome power available to a lowly peon with a credit card?