Hacker News new | ask | show | jobs
by pydry 2145 days ago
Are you arguing that it's somehow impossible or technically unreasonable for a startup search engine to piggyback off google's search index in a similar way to how duckduckgo piggybacks off bing's search index?

And that the very idea that this could be made possible through law makes you chuckle?

2 comments

I can't answer for OP but I can say it makes me chuckle too.

If the solution was purely to force Google to sell access to their index then yes it seems possible on the surface.

But as mentioned index and ranking are inextricably tied together.

Even if they weren't, no other organization is going to be able to produce search results comparable to Google using their index. You're underestimating what goes on under the hood.

So then the answer (often in these conversations) becomes to open up the ranking algos too.

The problems with that are numerous so I'll just point out some of the bigger ones:

- Arms race: Search is a constant arms race between providers and 3rd parties trying to game the system. The minute you make the algos public, gamers win the race. Search result quality returns to the way it was in the 90's and stays that way until someone else comes up with proprietary algos that work (but is that even legal at this point in our thought experiment?)

- Motivation: If search is open and you therefore can't directly profit from your efforts to improve it (because you automatically give away anything you create to competitors) where is your motivation to keep innovating?

- It's harder than you think: Truly, there's so much more going on in modern search indexing and ranking than you likely realize. The chances that some new organization (especially a gov organization) given access to Google's black box as it exists right now would be able to maintain search result quality for any significant length of time is essentially zero.

But let's imagine that it's as easy as many people think... Wouldn't the solution then be to build a public alternative rather than effectively killing what we have now?

>But as mentioned index and ranking are inextricably tied together.

I'm pretty sure they're quite extricably tied together. I'm almost certain google's engine weights the different ranking variables (e.g. page speed) differently depending upon context. Why not expose those variables to other search engines? Well, it would kill google's search engine dominance - if you're concerned with that...

Unbundling isn't technically infeasible and it would create more competition. This would help with the arms race alluded to. What if another search engine used google's index to build a more spam free index? not good for google but great for Joe public.

>Motivation: If search is open and you therefore can't directly profit from your efforts to improve it (because you automatically give away anything you create to competitors) where is your motivation to keep innovating?

Nothing saying that they can't make money from the users of their index just as Bing makes money from DDG. However, is there any reason their search engine shouldn't compete with other similar offerings? Maybe somebody out there does it better.

Well, other than "we've got an unfair advantage and wed like to keep it please"

>It's harder than you think

Actually it's probably a LOT easier. This idea is a direct attack on google's power and the easiest response they can make is "too difficult. not possible". Not "this would fuck us in the bottom line". Simply "we can't do it, who are YOU to tell us it's possible? "

fwiw if you look through history similar reactions were made to attempts to regulate pretty much all utilities. Then it happened. This kind of response is kind of an expected part of the process. Most recently it happened in the UK when utilities and banks were told to open up API access to their data. Same claim you made.

Part two is when they tell you "it's not fair!". It's coming.

Bing's index and algos are not available to DDG, there's no comparison there. DDG uses Bing's results, they can't see how they're produced. Incidentally, Google offers a similar API.

> Actually it's probably a LOT easier

Can you support that claim?

Just the scale alone is mind boggling when it comes to search.

Then throw in natural language processing, contextual signals, hubs and authorities, content categorization (which grows ever closer to looking like actual understanding), machine learning, a host of other basic and ever evolving quality signals that exist both in and inter-dependently of one another, the more complex signals that arise from the above and on and on.

Search is hard. Even the most casual of Googling (or maybe Binging would be apt in this case) will provide you with endless info about how hard it is.

"Search is hard" deliberately misses the point. This isn't about whether search is hard it's about whether decoupling search engine from search index is hard - whether the APIs used by one could be used by others.

This trick you're echoing is, incidentally, used every single time a government comes looking at unbundling opporunities. I remember distinctly how Microsoft claimed it was "too hard" to decouple office from windows. The banks in the UK made the same claim. They were all equally ridiculous and all equally self serving. There is a lot of precedent here.

Google will pretend that it is "impossibly hard" to expose their internal APIs as well, just as every other company did. It would be surprising if they didn't.

Search is hard, yes. A lot harder than exposing APIs, isn't it?

Bing doesn't make its search index public. It has a query api, where you can provide a search query and get Bing's results on a per-query basis.

The closest comparison to what this person wants is Common Crawl, which is minuscule compared to the big players, and is a 100TB gzipped download that gets updated monthly.

Bing provides API access to its search index. IIRC the type of queries you can do are a bit more sophisticated than just "what's available via the Bing search engine" (e.g. select a market).
That's not index access. You are getting ranked results. Index access would give you the posting list for the term "moose", or the intersection of the posting lists of "moose" and "caribou", or whatever.

You can't make a neutral service that returns ranked results, because that's contradictory.