Hacker News new | ask | show | jobs
by natorion 1221 days ago
We regularly get feature requests for Headless to provide a field or property that can be polled by JS frameworks to detect if Headless is active e.g. windows.isBot.

Well, Headless is open source, which means anybody could build a Headless version with such a property set to "I am a human, trust me!" and employ such a modified binary ... ;-)

4 comments

Oh absolutely, relying on a header would be a placebo at best. I was thinking more along the line of having two teams, one that develops Headless and another team at Google that try to defeat it non stop. An official game of cat and mouse. Project: Tom and Jerry? I guess legal would never buy into that name.

My own personal method for my silly hobby sites is just to put passwords on things with an auth prompt delay.

Why should Google redteam their headless browser though? As other comments point out there's plenty of ways for bot detectors to id bots even with a browser which mirrors a normal one: https://news.ycombinator.com/item?id=34858056

Almost all of those are things are outside of the scope of the browser itself. And anyone doing serious bot attacks already have scripts/forks that modify these signals. I don't see how the chrome team could do much to help stop that at that level.

In theory their blue team could come up with even more advanced puzzles that bots trip over and then open source and document the bot puzzles. I don't know that they would, incentives or lack thereof and all. If nothing else it might make their work day more fun.

Or if I put my evil corp hat on, the incentive could be that they make puzzles that only Headless can get around and all other bots become trivial to block and obsolete by even the least knowledgeable hobbyist. Perhaps Google release Nginx, Apache HTTPD, Apache Traffic Server, Envoy and HAProxy modules that only Headless can get around and all other bots internet-wide are entirely silenced. Chrome becomes the one and only bot to rule them all.

Why would they want to do that?
Oh man, you're making me put that hat back on.

I suppose that Google going through that exercise would mean that they get market dominance on bot gathering data and anyone not using Chrome Headless would be unable to obtain freebie data. This could enable future features whatever that may be. readjusts hat One future feature could be auto-discovery of Google DNS and Google proxies in GCP so they can learn about new data sources through crowd-sourcing thus making their big-data sets more complete and their machine learning more powerful. Developers could block the proxies or compile them out but as we know most people are too lazy to do this and many won't care.

Another advantage would be that eventually the only bots abusing Google would be bots using their code and they would know how to detect and deal with as they would implement their own open source anti-bot modules in their web servers, load balancers, etc...

There are more obscure ideas but I am doffing the hat before the hat-wraiths sense it.

You jest, but I could actually see this becoming a thing. I envision a future dystopian internet where people first have to authenticate their network gear, PC's, laptops, cell phones, cars, trucks, e-bikes, toasters, coffee makers to a government contracted service. Once authenticated they utilize something similar to that RFC but probably instead a nonce or jwt token tied to their device that gets embedded in the packet header somehow. Then sanctioning a continent, country, state, ISP, city, company, manufacturer, distributor or person would be simply disabling their evil bits so to speak.

The push for this is starting with adult content [1] but the goal posts could easily be mounted on train car with a very long and smooth train track that only goes downhill.

[1] - https://news.ycombinator.com/item?id=34726509

There's a huge amount of aggro pissy shitthrowing that Chrome is facilitating automation in these threads. Bollocks.

You know what? The Internet Is For End Users [1]. If we're going to cite an RFC, it should be RFC 8890. Not having a better headless Chrome would be a violation of the most basic principles of the internet.

There are some cases where automation can get out of hand, but blocking these efforts should not come at user expense. So says the RFC8890, and a general collective belief/hum-in-the-room. The availability of a good browser like Chrome helping should not be an issue, given how many other ways bad players have to go too far & cause harm to sites. The people who have to deal with this are not the priority & this doesn't radically change their troubles; this radically helps end users wishing to exercise agency though.

In most cases being able to script & automate a site is a completely primitive user-agency, of no special regard. Headless Chrome being a somewhat tolerable way of doing that scripting is 100% morale, correct. It greatly assists us in fulfilling a primary & clear overarching purpose of the internet: to be for end users.

I wish I could say I cannot believe the complaining & whinining & snivelling, the pretentious-nonsense/acting-offended that Chrome would dare help make good automation. I wish I could say I don't think this crowd recognizes nor comprehends the basic purpose of the internet, but again, I think I know better; I suspect they do but their protests are disingenous, that they have allied their hearts with darker forces, against the user.

[1] https://www.rfc-editor.org/rfc/rfc8890

>Headless is open source, which means anybody could build a Headless version with such a property set to "I am a human, trust me!"

This is flawed reasoning. Just because we can't eliminate abuse from headless browsers that doesn't mean we shouldn't work to reduce it. Finding such a modified binary or making it yourself is additional friction that will cause less of these bots to exist. Some people may not care if a website is able to block them or not or some people may not decided to do the work to read the robots.txt. By implementing these capabilites into the product by default you are making the web ecosystem a better place wit less abuse. You are right that someone could make a version without the antiabuse parts, but surely that fork will be less popular and less used.

What about if I want the headless browser to look exactly the same? Why should we make a distinction between humans and machines?
If I run a soup kitchen, and Google is sending robots to my establishment which are indistinguishable from humans, I should I have the right to ask if the client is a robot.

I would hope that Google's robots would not be programmed to lie to me, but would be honest.

If robots are required to be honest, then I have a choice to serve them or not. If they are not honest, I do not have a choice.

Then don't add code to your site to make it work different?

>Why should we make a distinction between humans and machines?

Because machines can be used to abuse a site at a scale that humans can't. Site owners want to protect their site against abuse.

By modifying the browser. It feels like DRM by a different name to me.
Okay? I don't care what you call it. It will reduce the amount of abuse in the world and that is a good thing.
While I appreciate your answer from a technical point of view - indeed it is trivial modify/spoof - there is an ethical dimension.

Should bots have the legal right to say they are human?

For example - if Google Inc is visiting a web page to collect information about it using a headless bowser, and the server asks - are you a bot - should Google be legally or ethically allowed to answer no? (declarations in headers could remove the need for question/answer chatter.)

(I want to pre-empt dismissing this line of questioning via 'what if Google wants to know how the site will be served to a human for better search results because google could include a specific header for that, eg "I am a bot, but request that you serve the version of this page served to humans". It would be up to the server to honor or reject that request.)

The defaults Google choose have compounding effects in our society. If you make it "normal" for bots to pretend to be human, the industry has minimal pressure to hold any standard above what you do, and better norms may never appear, or be delayed by a decade. The alternative is to be thoughtful today to try to create a better world.