Hacker News new | ask | show | jobs
by ChuckMcM 410 days ago
At Blekko we advocated for this as well.

Google has two interlocked monopolies, one is the search index and the other is their advertising service. We often joked that if Google reasonable and non-discriminatory priced access to their index, both to themselves and to others, AND they allowed someone to put what ever ads they wanted on those results. That change the landscape dramatically.

Google would carve out their crawler/indexer/ranker business and sell access to themselves and others which would allow that business an income that did NOT go back to the parent company (had to be disbursed inside as capex or opex for the business).

Then front ends would have a good shot, DDG for example could front the index with the value proposition of privacy. Someone else could front the index with a value proposition of no-ads ever. A third party might front that index attuned to specific use cases like literature search.

It would be a very different world.

5 comments

Access to the click stream is the big bit.

Ie. Knowing which users clicked which search results.

Without the click stream, one cannot build or even maintain a good ranker. With a larger click stream from more users, one can make a better ranker, which in turn makes the service better so more users use it.

End result: monopoly.

The only solution is to force all players to share click stream data with all others.

Click stream is useful, without a doubt. It isn't essential. We had already started the process at Blekko of moving to alternate ways for ranking the index.

That said, if you run the frontend as proposed, you get to collect the clicks. That gives you the click stream you want. If the index returns you a serp with unwrapped links (which it should if it was unbundled from a given search front end) then you could develop analytics around what your particular customers "like" in their links and have a different ranking than perhaps some other front end. One thing that Blekko made really clear for me that the Google idea that there was always the "best" result for that query (aka the I'm Feeling Lucky link) there was often different shades of intent behind the query that aren't part of the query itself. Google felt they could get it in the first 10 links (back before the first 10 links were sponsored content :-)) and often on the page you could see the two or three inferred "intents" (shopping, information, entertainment were common).

I don't think that's quite true, as competitors like Kagi have been able to compete well with effectively zero clickstream (by comparison). It'll help, but it's not the make-or-break that the index is.
I think a click stream isn't necessary, but Kagi is not a good basis for the argument in my opinion.

Kagi is a primarily meta search engine. The click stream exists on their sources (Bing, Google, Yandex, Marginalia, not sure if they use Brave). They do have Teclis which is their own index that they use, and their systems for reordering the page of results such as downranking heavy ad pages, and based upon user preferences (which I love).

https://seirdy.one/posts/2021/03/10/search-engines-with-own-... is a source I would recommend checking out if you are curious.

Kagi sends searches to other providers (Bing?) and then simply re-ranks the results, so they're effectively inheriting the click stream data of those other providers.
> Google has two interlocked monopolies, one is the search index

The index is the farthest thing from a monopoly Google has - anyone can recreate it. Heck, you can even just download Commoncrawl to get a massive head start.

I see it a bit differently, many (most?) web sites explicitly deny scraping execept for Google. Further Google has the infrastructure to crawl several trillion web pages and create a relevant index out of the most authoritative 1.5 trillion. To re-create that on your own, you would need both the web to allow it, and the infrastructure to do it. I would agree that this isn't an insurmountable moat but it is a good one.
Most websites only explicitly deny scraping by bad bots (robots.txt). Things like Cloudflare are a completely different matter, and I have a whole batch of opinions about how they are destroying the web.

I'd love to compete directly with OpenAI, but the cost of a half million GPUs is a me problem - not a them problem. Google can't be faulted for figuring out how to crawl the web in an economically viable way.

Then why do we see all of these alt search engines and SEO services building out independent indexes? Why don't the competitors cooperate in this fashion already?
Because everyone worships Thiel's "competition is for losers" and dreams of being a monopoly. Monopolies being the logical outcome of a deregulated environment, for which these companies lobby.
Throughout history there are very few monopolies and they don't normally last that long; that is unless they get are granted special privileges by the government.
Concentration is the default in an unregulated environment. Sure pure monopolies with 100% market control are rare but concentration is rampant. A handful of companies dominating tech, airlines, banks, media.
Concentration seems much more prevalent in heavily regulated markets e.g. utilities / airlines. In many cases regulators have even encouraged this e.g.finance.

There is no default for unregulated markets. It's a question of whether the economies of scale outweigh the added costs from the complexity that scale requires. It costs close to 100x as much to build 100 houses, run 100 restaurants, or operate 100 trucks as it does to do 1. That's why these industries are not very concentrated. Whereas it costs nowhere close to 100x for a software or financial services company to serve 100x thee customers, so software and finance are very concentrated.

The effect of regulation is typically to increase concentration because the cost of compliance actually tends to scale very well. So businesses that grow face an decreasing regulatory compliance cost as a percent of revenue.

You are comparing Apples and Oranges. You just can't compare the barrier of entry for Software business and an Airline, even without any regulations. It's just orders of magnitude more expensive to buy an airplane than a laptop, and most utilities are natural monopolies so they behave fundamentally different.
Home building is interesting because I think a major blocker to monopoly-forming is the vastly heterogenous and complicated regulatory landscape, with building codes varying wildly from place to place. So you get a bunch of locally-specialized builders.

Regulation can increase concentration in a high corruption/cronyism environment — regulatory capture and regulatory moats. There is plenty of that happening.

In building, I think we have local-concentration, due to both regulatory heterogeneity and then local cronyism - Bob has decades of connections to the city and gets permits easily, whereas Bob’s competitor Steve is stuck in a loop of rejection due to a never ending list of pesky reasons.

Concentration is not monopoly, and furthermore your comment does not begin to address the critical part of parent’s comment : “does not last very long”

Inequality at a point in time , and over time , is not nearly as bad if the winners keep rotating

airlines? Worst example ever. There are lots of airlines coming and going. "Tech" isn't even an industry.
> unless they get are granted special privileges by the government

That's what all the lobbyists are for.

None of the people or organisations that advocate for "free markets" or competition actually want free markets and competition. It's a smoke screen so they can keep buying politicians to get their special privileges.

They always inevitably end up being given special privileges.

Because, contrary to what we would all like to believe, once a company becomes large we don't want them to go under, even if they're not optimal.

There's a huge amount of jobs, institutional knowledge, processes, capital, etc in these big monopolies. Like if Boeing just went under today, how long would it take for another company to re-figure out how to make airplanes? I mean, take a look at NASA. We went to the moon, but can we do it again? It would be very difficult. Because so many engineers retired and IP was allowed to just... rot.

It's a balancing act. Obviously we want to keep the market as free as possible and yadda yadda invisible hand. But we also have national security to consider, and practicality.

> Throughout history there are very few monopolies and they don't normally last that long

That's completely incorrect. Historically, monopolies were pretty long-lived. So much that they were often written into the legal codes.

It's only fairly recently that the pace of innovation picked up so much, that monopolies not really die per se, but just become irrelevant.

This sounds a solution contrived to advantage companies that want access to this data rather than an actual economically valid business model. If building an index and selling access to it is a viable business, then why isn't someone doing it already? There's minimal barrier to entry. Blekko has an index. Are you selling access to it for profit?
There are search engines that sell api access to their index. Pretty sure Bing, Yahoo, and Yandex all do.

Blekko also did, 10 years ago. When they still existed.

I think Brave Search does too?
We do: brave.com/api
You mean like a white label search engine? Customized with settings?