Hacker News new | ask | show | jobs
by unp3rs0n 5608 days ago
I don't understand why everyone is using the term "copying the results". I think what Bing did was very smart, they incorporated user clickstream data. One could accuse this method of walking a thin line morally, but I suspect that Google's accusation wouldn't have stood any water as a lawsuit.
3 comments

Because by incorporating clickstream data from Google, they're effectively copying Google search results. Bing should blacklist Google from its clickstream data.
Let's say tomorrow DDG is the search engine with the largest market share. Then Bing would be getting all the clickstream data from DDG. I hope you do realize that this "algorithm" is not Google specific. Its just a novel ranking technique that incorporates a human user feedback loop and is a pretty well known technique in the information retrieval field.
It would be equally unethical to be copying DDG's results in this fashion.
Highly ironic, though, as DDG uses Yahoo as a backend, which uses Bing, which uses Google, which would use...DDG? I think there's a cycle in that list somewhere...
Equally unethical sure (x=y), but differing opinions on whether that (x) is ethical or unethical. Personally, I see nothing wrong with it.
Not necessarily. It is possible that Google incorporates clickstream data too.

The problem that Google's little experiment highlighted was: given the utter lack of any other signal, Bing uses the fact that that URL was ranked #1 by Google's search engine and clicked on by a user.

Having said that: had I been doing this experiment at Google, I would have also added the following variations:

- for some search terms, rank the honeypot URL #1 but don't click on it

- for some search terms, rank the honeypot URL #1 on some _other_ search engine's list and click on it. How can they do that? There are search engines out there which use Google in the backend.

Experiment #1 would have shown more blatant copying. Experiment #2 would have shown whether it's just Google, or any other search engine.

Google has said they do not use clickstream data for ranking from Google toolbar.

They did have variations of their tests. Cutts mentioned this during the bigthink panel. Sometimes they went to Bing first or not, sometimes they clicked on the links and sometimes not, and various other things.

Google has said they do not use clickstream data for ranking from Google toolbar.

Please provide a reference. I've been looking for this statement and haven't found it. When Googlers are directly asked, they pointedly don't answer or say they don't know.

Amit Singhal's statement was carefully worded to be ambiguous on this matter, and Google has apparently confirmed that page-load-time data (at the very least) from the Toolbar does affect rankings.

Such use by search is definitely allowed by Google's written privacy policy. The confirm dialog a user passes when installing the Toolbar refers to that privacy policy.

It'd be very easy for an official Google spokesperson to say clearly that Toolbar data doesn't drive search rankings, if that were true. That they haven't strongly suggests it is used.

Search expert Danny Sullivan made the same observation in his 'Bing: Why Google’s Wrong In Its Accusations' article:

As For The Google Toolbar

Meanwhile, I’m on my third day of waiting to hear back from Google about just what exactly it does with its own toolbar. Now that the company has fired off accusations against Bing about data collection, Google loses the right to stay as tight-lipped as it has been in the past about how the toolbar may be used in search results.

Google’s initial denial that it has never used toolbar data “to put any results on Google’s results pages” immediately took a blow given that site speed measurements done by the toolbar DO play a role in this. So what else might the toolbar do?

http://searchengineland.com/bing-why-googles-wrong-in-its-ac...

You are being disingenuous. That you cannot find them explicitly denying something does not therefore make it true nor provide any evidence that it is true.

“Absolutely not. The PageRank feature sends back URLs, but we’ve never used those URLs or data to put any results on Google’s results page. We do not do that, and we will not do that,” said Singhal.

http://www.toprankblog.com/2006/04/matt-cutts-on-toolbar-dat...

In this one, Matt Cutts all but explicitly says that they do not use it.

"Put any results" is vague, perhaps intentionally finessed, language – as I (and the Sullivan quote) already highlighted in the grandparent comment. It could mean, especially in the context of the Bing allegations, that Google Toolbar data never adds a new URL to the index or a result-set, but is still used for relative ranking of already-known URLs.

In the link you provide, Matt Cutts says: "I’m not going to say definitively that Google doesn’t/won’t use toolbar data (or other signals) in ranking." And: "I’m not going to say whether Google uses a particular signal in our ranking." Cutts simply says Toolbar data could be problematic because it could be gamed. Well, links can too – didn't stop Google from building its empire on that impure signal. This seems to me more of the same finesse that creates the impression of a denial without a denial.

Further, it's clear from previous Google statements that page-load-timing from the Toolbar is used to affect rankings. That alone invalidates the 'strong' interpretation of Amit Singhal's statement. So Singhal means something other than "Toolbar data never effects search rankings". What does he mean? Just requoting that vague statement doesn't clear anything up.

And since everything Google does with this data is a closely-guarded secret, how can we be sure of anything, short of awaiting (and then trusting) definitive Google statements? And I can't yet find any clear statement about Toolbar data usage – even though lots of people seem to think they have seen them. (I think general warm feelings towards Google are creating this mistaken impression.)

I don't expect a clear statement; I believe Toolbar clicktrails are a big part of Google' secret sauce. But it means Google could be using equivalent techniques to Bing's, and simply have better filters against the blatant dominance of a single website or only 20 clickers on any result sets.

It's a subtle point, but the whole point of this issue is that Bing actually does copy more than just "clickstream data". Think about what "clickstream" data is - user clicks a link, the link and the text of the link get sent to Microsoft. But in Google's text the search term was not in the link text, nor was it in the target page the user went to. So then how does the search term end up in Bing's index? They have to get it from somewhere - where? The logical conclusion is they get it from the URL of the page the click is performed on, or from the referrer header when the link is clicked (essentially these are the same thing). But how do you do that generically? The search URL contains the query terms in a format that is unique to Google. The only way Bing can be seeing it is if they deliberately are parsing out the Google search terms by specifically targeting how Google encodes them.

So rather than just generically "incorporating clickstream data" this is actually using a special procedure to extract search terms from clicks that are determined to be google searches and then putting those search terms into the bing index.

What Bing did is very smart! It actually was, but not crediting Google there just makes it look like a cheap shot.

You won't quote someone without citing their name now would you?

Sadly you are right, they won't be able to push a lawsuit. Not enough grounds for it, however Bing should acknowledge what they did and are doing crediting Google.

If Google hadn't caught them, we would all be thinking Bing did it on its own, which really isn't the case.

Taking liberty to concoct a scenario. If Walmart asked shoppers to take a photograph of the product layout on display at their favorite shop (which say happens to be Target because its the most popular in town) and used that to make small modifications to its own layout, would you say Walmart needs to credit Target, or that its copying Target? This is an arrangement between Walmart and its shoppers and there is nothing Target can do about it other than making a brouhaha. I don't see why things have to be different in the digital world. We all know how user interfaces historically have been blatantly ripped off.
Actually, Bing should credit the users providing all that clickstream data and all the websites they are using. It just happens that Google is one of many... what's your point?