Hacker News new | ask | show | jobs
by jiggawatts 994 days ago
One way Google maintains their monopoly is that many websites block all bots except for the Google indexing bot.

This makes it literally impossible for anyone to make a competing search engine because millions of doors are slammed in their face. There is no practical way to negotiate access at this scale either leaving no options for small startups — e.g.: AI-based search!

One possible anti-monopoly measure would be to force Google to mask the identity of their bots. E.g.: force them to use a random IP address pool that third parties can also use without their permission.

IMHO breaking up corporations is a bit heavy-handed and not the only remedy available.

13 comments

>IMHO breaking up corporations is a bit heavy-handed and not the only remedy available.

Respectfully disagree in Google's case. Indexing is not their only advantage - having full, unfettered access and control over email, maps, Android play, cloud, and the other myriad divisions plus knows what else via side deals is too much and fully justifies breaking up the business.

Your suggestion works if we were still 2002-ish, before they got too big

Microsoft has all the same services that Google has, including the ones you just mentioned and more.

Bing, Outlook, Bing Maps, Microsoft Store, Azure, Windows, Edge, Office, OneDrive, XBOX division, and more.

If they can't compete with Google, then maybe they aren't offering as good a service for most of these, and it's not for lack of trying to manipulate the market into using their services.

I do agree with what you said though, both Alphabet and Microsoft should be broken up, and I don't mean just having multiple separate companies that operate closely together, but proper separation. This also goes for META and Amazon.

It's not for a lack of trying, but Google has their tentacles everywhere.

You literally can't make a mobile phone today without having support for all Google services. Native support is better than 3rd party. And now all of your user's data is in Google's hands.

Yes, there are some extreme ... fanatics(?), who can live with a phone that doesn't have YouTube, Gmail, Google Maps, Google Drive or Play Store in it, but the Joe/Jill Regular will never ever go for that today.

Of all the companies beside Apple, Samsung is the only one I can think of with the muscle to maybe do a device with all of their own services.

> Yes, there are some extreme ... fanatics(?), who can live with a phone that doesn't have YouTube, Gmail, Google Maps, Google Drive or Play Store in it, but the Joe/Jill Regular will never ever go for that today.

I would fit your definition of fanatic and I have a friend who does too. The only google service we still use is YouTube because that’s the hardest one to replace entirely unless you want to cut yourself from a lot of very interesting and entertaining content. Otherwise it’s not so hard. Gmail is hardly the best offering on the market, Apple Maps are good enough for most tasks (Google Maps still has better POI data sadly), etc. You don’t need to be a “fanatic” to de-googleify your life, even if only partially.

With Apple it's doable yes. I'm in the same situation, YouTube is the only service of theirs

My point was that there's a snowball's chance in hell for a third party in addition to Google and Apple to come on to the market with a de-googled mainstream device.

Microsoft had a chance with Windows Mobile, but they messed it up.

If you are running a custom rom on Lineage or some other version of android, check out Newpipe on Fdroid. I've installed it for normy android users and they stopped using the normal youtube front end. It's awesome.
It's funny how eager Americans are to break up the Big Tech companies that happen to be the only reason the US economy is not a complete dumpster fire. Look at GDP growth for the US vs. the Eurozone since 2008. The high level of integration, both horizontal and vertical, is what makes the US Big Tech companies so economically productive and valuable.

But sure. Kill your software industry. Does the US even know how to do anything else? Does the US even manufacture anything physical anymore? Last I checked tiny little Denmark produces more wind turbines than the US and tiny little Switzerland produces more CNC machines than the US.

> the only reason the US economy is not a complete dumpster fire.

The US economy is a complete dumpster fire for those of us worth less than seven figures. Big tech monopolization does not strengthen our productive capacity, and breaking up the big tech companies won't weaken it. If anything, it will strengthen the economy by making it more viable to found a tech startup without the explicit goal of being bought out. We aren't "killing" our software industry; we're revitalizing it.

Idk. This makes me super nervous. Look at what the bell breakup did for innovation in software. The transistor was literally invented there before it was broken up. Likewise, I think it's probable that if Google is broken up, you'll not see anything as innovative as the transformer models, GFS, or Spanner anymore. I think quality of life for employees will likely go down, as it did when bell labs was broken up.
Different time. Bell was one of the last great break ups and occurred just before the modern pro-Big Business, ultra-financialized era kicked off in earnest. If Bell weren't broken up, Bell labs would have been shut down or severely trimmed regardless as the execs diverted the R&D spend to stock buybacks or other inefficient use of funds that benefits the shareholders over all else (e.g. Boeing).

I think a better parallel is the late-20s early-30s Robber Baron breakups. There's been much ink and elections spilled over how that anti-trust era contributed directly to the proliferation of post-war innovation (e.g. https://bookshop.org/p/books/goliath-the-100-year-war-betwee...)

Is Google not on the bleeding edge of research into LLMs? Are they not investing billions into cloud? Google research has quantum computers, and is investing in self driving cars, healthcare, and god knows what else.
> The US economy is a complete dumpster fire for those of us worth less than seven figures.

Not sure what you mean, you can invest in VTI, wait 30 years, and hit 7 figures real. The US economy is stronger than almost every other economy on the planet and is so even for the non-rich.

> Not sure what you mean

Look at the risks a poorer person in the US has to contend with, many of them life-ending expensive, as well as the difficulty of finding places to live that offer decent job markets, services, amenities, and are affordable.

Life here can get really ugly for people that are well above the 'poverty line'. That's not to say it's really the overall economy making things hard for them, but more like microeconomic constraints. However, that's the lens they see the economy through.

I’m all for breaking them up but I’m not convinced it will revitalize the industry. Google is enormous. Even if it is broken up, the resulting pieces will still be massive, and almost impossible to compete against.
Yes, the United States is the #2 manufacturer after China, producing 16.6% of manufacturing output with just 4.2% of the world population.

https://worldpopulationreview.com/country-rankings/manufactu...

Hey that's a great point! Meanwhile you have China, Russia, Israel, Korea passing all sorts of anti competitive edge AND funding to singular tech companies to corner the GLOBAL market. One need just look at Tiktok and Temu and Huawei
I can buy this argument for Apple, but Google? To me it seems like there is an intuitive split between Search+Ads, GSuite, YouTube, and Android. Why do you believe breaking these up would adversely affect the US economy?
That's not true.

First of all, no website owner in their right mind is going to block Bing.

Second of all, they have to detect that you're a bot in the first place. Most sites don't employ sophisticated anti-bot technology. Simply running a headless browser (which you need to do anyways for JavaScript) with a common user-agent, and slow enough browsing not to be rate-limited, will let you index 99.9+% of sites.

Many website owners do explicitly block Bing and all other crawling bots in their right minds.

If you have a wide low-traffic website then bots of all sorts will make a majority of your traffic and subsequently a majority of your AWS costs.

If you see money spent on search engine X indexing and very few users incoming from that search engine it's a rational decision to block it. Or ask for money (that actually happens).

Overall it's a systematic problem with building a search competitor: 1) Your costs are largely proportional to the size of your index 2) Your income is proportional to your userbase 3) You need a huge index to be competitive even if you don't have any users yet

So, very hard to bootstrap even when you exclude all other advantages of the existing monopoly like browser-based distribution.

Microsoft has the money to beat the indexing problem so they argue about distribution in court but all the small players can't even get to that level of failure.

> Many website owners do explicitly block Bing and all other crawling bots in their right minds.

If by "many" you mean "less than 0.1%". Nearly all sites want traffic, which search engines provide.

And you're moving the goalposts from the commenter I was responding to.

Nobody disputes it takes a large capital investment to create an index in the first place. But this is the business world -- that's what investors are for.

But the idea that websites provide their public content to Google and leave other crawlers with no way to access it is simply untrue. That is not a factor hindering competition.

> Nearly all sites want traffic, which search engines provide.

You don't seem to grasp the problem of indexing. If search engine fetches 100M pages but only brings 100 users it's a net loss for the website because of server costs. This means that a marginal player cannot index a big website.

I'm aware of at least one website from top20 that actively blocked crawlers other than Google citing this as a reason. And this website has tons of high-quality ugc that ranks at the top on many nontrivial queries. A huge blow to search quality when absent.

Having said that, it's true that for big search engines the issue is mostly distribution. However for small players it's distribution AND indexing and in the end they have to resort to buying the search results from big players.

It is true if you respect robots.txt.
Got any reports/statistics to back this up? I highly doubt websites are not wanting major search engines to index them. AFAIK it's been standard practice to use `User-agent: *` for a long time. There are other anti-crawling measures because the bad crawlers are not going to respect your robots.txt.
It's not necessarily a lot of sites that block non-popular bots - but often it's big sites (i.e. content-centric sites such as Social Media). Think Yelp, Twitter, LinkedIn, Instagram, etc.

That can add up to a serious percentage of the web.

Does Google penalize sites for not blocking other bots? If not, then it is not Google "maintaining their monopoly", it is everyone deciding for themselves to only support Google.

The government forcing restrictions on companies that people choose freely is extremely dangerous and will definitely be used for political reasons.

I would argue the takeaway here is not precisely that this behavior is "maintaining their monopoly", rather that it's incredibly strong evidence that the monopoly exists.
It's not a transaction, though, just a stated preference by third parties. It has no bearing whatsoever on the legal issue at stake.
Being only tangentially familiar with the indexing bots…what legal barriers are in place to prevent a competitor from impersonating a google indexing bot? Is it just a matter of the google bot originates from x subnet so that’s the only one webmasters allow? What’s to stop a competitor from running their own bot but sending user-agent: totally-the-google-indexing-bot-and-not-a-competitor?
Source addresses: the Google bot traffic comes from a small set of Google-owned IP address blocks.

Third parties bake this into things like Web Application Firewall (WAF) rules. For example, Azure App Gateway WAF has a policy category for “known bots” which includes Google but excludes your tiny AI startup.

It’s a moat built by giant corporations to keep tiny players in their place.

Google "helpfully" publishes their bot source IP addresses: https://developers.google.com/static/search/apis/ipranges/go...

AWS also provides named rules such as "bot:name:googlebot": https://docs.aws.amazon.com/waf/latest/developerguide/aws-ma...

I agree it’s a moat, but why would azure restrain competition from google? I think it’s just yet another example of an anti-abuse collateral damage, like email anti-spam blocking small unknown servers or Cloudflare blocking unknown IPs.
Because Azure customers want Google to be able to index their sites.

Would you host your e-commerce or social media site on a cloud provider that blocked Googlebot?

It's not Microsoft giving Google a handout out of the goodness of their hearts; it's Azure customers demanding that functionality. (Those Azure customers also don't care about a random little search startup and probably don't want to pay any egress fees to serve traffic to it.)

AWS offers the same: https://docs.aws.amazon.com/waf/latest/developerguide/waf-bo...

Oracle as well: https://docs.oracle.com/en-us/iaas/Content/WAF/Bot/good_bot_...

Akamai as well: https://www.akamai.com/products/bot-manager

Moats are anticompetitive, that's the entire idea behind why founders want them. Who else would a moat keep away, customers? ;)
Customers are kept in by moats.
That's lock-in, moats are things that keep your competitors out. Source: https://www.investopedia.com/ask/answers/05/economicmoat.asp
For example, Azure App Gateway WAF has a policy category for “known bots”

So why should we take Satya's testimony seriously? Azure is a Microsoft product and if what you're saying is true, Microsoft is part of the problem.

Almost surely that Azure customers want that.

https://news.ycombinator.com/item?id=37746480

though chatGPT seem to have had no trouble scraping data from everywhere
In the past couple of years, many large tech companies developed a legal theory that scraping the public web for "internal" purposes is OK, and that any ToS-based or technical restrictions are just suggestions that they don't have to follow.

These "internal" purposes included growing your social network, monitoring or reverse-engineering the algorithms of competing search engines, and now, it includes training ML.

Which is funny given that when others are doing it to them, they go to great lengths to stop it, and sometimes complain loudly or threaten lawsuits.

I think the main reason the big players don't sue each other is that it's a bit of a Mutual Assured Destruction kind of a deal. Google is doing it to Microsoft, Microsoft is doing it to Google...

Technology patents create a complex MAD situation. At the end of the day Alphabet will be paying tons of money to Microsoft who will pay it to Apple who will pay it to Samsung who will pay it to probably Nokia or Xerox or Nintendo or whatever. And this goes on for each and every company with wealth in these spaces.
They (BigCo) traditionally cross-license to benefit each other and lock out new entrants.
and too late companies like Reddit and DeviantArt realized the billions of $ in UGC value that their websites had laying stagnant.
>One way Google maintains their monopoly is that many websites block all bots except for the Google indexing bot.

Because Googlebot is actually _friendly_.

O boy are there bad bots out there for search engines that often DDOS your site to shit.

Yandex is probably the worst out there from experience. Not only will it hammer you, they put in extremely aggressive retries such that even if you ban it temporarily for 4 hours, the literal second your IP block of their spiders expire, they will flood you with requests, and they don't even know how long the ban is, their bots just keep trying and trying.

I think most corps allows known bots. To allow all traffic to a major site is not possible.

The problem is bigger the other way around I would say. That Google, AWS, Azure and so on use the same AS numbers for private and public cloud. It is not easy to detect if it really is the Google bot or some low-lifes performing DOS-attacks from GCP. Many attacks, especially state sponsored attacks, comes from trusted clouds.

Agree on the first part, but I think actually this is an argument for breaking them up along those lines.

I've often said that the "breaking up" of Google that makes the most sense is to split the crawling/indexing service from the rest of the company. Similar to the way telecoms companies were forced -- in some countries -- to open up access to e.g. DSL infrastructure.

> One way Google maintains their monopoly is that many websites block all bots except for the Google indexing bot.

How can that be blamed on Google?

I agree. I don't relish the idea of certain other countries' services getting too large in the vacuum that breaking up Microsoft, Apple, Amazon, Alphabet will do.

Just imagine TikTok, then have that applied to Search, Mail, etc.

Do you want this?

> One way Google maintains their monopoly is that many websites block all bots except for the Google indexing bot.

No, I don't think that's true. I am writing a web crawler, not for search purposes, and I haven't seen preferential treatment for GoogleBot compared to others. Sure, some might be banned outright (though bad crawlers just ignore robots.txt and do whatever they want), but in most cases new bots have the same access rights than GoogleBot.

Also, your sentence doesn't pass the sniff test: you claim Google has better access than all other crawlers; but robots.txt is solely in the hands of the webmaster. How does Google coerce most website owners to block other bots? There is no conspiracy at play here.

Try LinkedIn, or Facebook, or Twitter, or Crunchbase.
I feel like something as fundamental to society as the internet should have a public index, no?
The idea sounds nice but politicians controlling an index would be an abject disaster.
Competitive markets > Governments > monopolies
I like this! But I do think many markets make sense to be heavily regulated, beyond just anti-trust. (For instance, my work involves wholesale power markets, which are heavily regulated by necessity, but in my view remain better than vertically integrated monopoly power providers.)
Government is the ultimate coercive monopoly; all coercive monopolies derive their power from the arbiters of force — the government.
Governments are at least supposed to be accountable to their constituencies, monopolies are only accountable to their shareholders.
"Of all tyrannies, a tyranny sincerely exercised for the good of its victims may be the most oppressive. It would be better to live under robber barons than under omnipotent moral busybodies. The robber baron's cruelty may sometimes sleep, his cupidity may at some point be satiated; but those who torment us for our own good will torment us without end for they do so with the approval of their own conscience."
What about efforts by say, the Internet Archive, and maybe more specialized indexes maintained by communities?
Already has been - what fraction of 'right to be forgotten' requests are actually in the social good, as opposed to just criminals of various levels covering up their crimes?
Of course, "criminals of various levels" "covering up" their, say, stupid 19-year-old mistake drug conviction is exactly why many people favor right to be forgotten things. The idea that it would be in the social good to have easy access to everything someone did or posted for decades seems ludicrous. Privatized ubiquitous surveillance and record-keeping is not a good thing.
Civil society was founded on, and still depends on, public trials and public laws. Since Hammurabi. That has nothing to do with "ubiquitous surveillance", it is the most important and basic record keeping.

It is up then to each person reading the records to decide if a crime is old enough or unimportant enough to not matter. That should not be the decision of internet censors or courts who want to sweep their traces under the rug. Or criminals wanting to do that.

If most people think that drug use is not something a person should be trialed for, then the laws need to be changed. Censoring doesn't remove the fact. It just removes discourse to the realm of gossip.

I think of it more as a "right to rewrite history".

"He who controls the past controls the present, and he who controls the present controls the future."

Are there any request that are actually just criminals of various levels covering up their crimes? Since the right to be forgotten exists I've been hearing about this misuse of this right but never have I seen one case been presented to show how it is being misused. If you don't know of such a case on what is your opinion of this right based then?