Hacker News new | ask | show | jobs
by brk 5681 days ago
Do they even have the ability to repair the data? If information is logged in real-time, and there is no easy way to filter through billions of search query terms to de-dupe (or whatever fix may be required), it might not be possible to correct the dataset.
1 comments

Care to elaborate on why you feel they cannot correct the data? I am having trouble understanding.
Because the original data on which the statistics are based was probably deleted a long time ago. And that's the only way to get the 'right' numbers. The only option would be to filter out the peak, then again, this will also lose all real information in that timespan. Just too much bother.
I doubt google - or any similar company who relies upon data aggregation and collection - makes a habit of deleting data.
I would assume that google, or any other similar data aggregation company, would log and keep statistics and summary information, but discard actual raw data. They do have some of the biggest storage capabilities, but thats no reason to fill it full of apache logs.
For instance, if a spam site ranks for a query, it is only super last resort to ban them manually - they would prefer to change the next incarnation of the algorithm to block that spam.