| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fotcorn 940 days ago
	A charitable interpretation might be that search requires a fair amount of compute, and is therefore a big denial of service vector. I am not sure how much behavioral data GitHub can gather from logged in user, and how useful that is compared to the code that is there anyway. Maybe to figure out which parts of code are important? But that isn't really user-specific.

3 comments

marginalia_nu 940 days ago

Yes, it's a real problem for anyone offering any sort of search capabilities. Like, about 0.5% of the traffic to my search engine is human. I'm not aware of any search engine that doesn't have similar stats.

link

hskalin 940 days ago

Off topic: how do you determine what percentage of search is coming from humans?

link

marginalia_nu 940 days ago

Well about 99% of the search requests I got back when I was using cloudflare couldn't get past their bot-mitigation, and of what made it through, at least half looked very automated.

link

superkuh 940 days ago

I'm a human and I can't get past cloudflare "bot mitigation" with my browser. Bot mitigation actually just means your browser executing the latest bleeding edge javascript functions to make sure your behavior is monetizable.

link

marginalia_nu 940 days ago

No that's not actually true at all. The website always worked with text-only browsers, cloudflare or not. Thoroughly tested with the likes of w3m and dillo.

Virtually all of the traffic that was intercepted claimed to be modern Chrome or Safari or similar, which should be capable of "executing the latest bleeding edge javascript functions".

The primary reason why anyone gets shit from bot mitigation is IP reputation, this is far more important (and effective) than looking at browser characteristics.

link

crtasm 940 days ago

Github could require captcha for non-logged in users, I suppose.

link

dylan604 940 days ago

I'm a human, yet I am unable to get past Steam's captcha. It is not the only site that I cannot prove to not be a robot. I'm guessing the number of collateral damage is worth it to them. I'm not a big gamer, and wouldn't be a big source of revenue for them anyway.

link

Kuinox 940 days ago

Steam has a captcha ?

link

kuchenbecker 938 days ago

No Dylan604 is a suspected Cylon /s.

Many companies will black-hole you and force infinite captias despite solving them correctly to waste resources.

link

mikro2nd 940 days ago

This (behavioural data) is precisely Microsoft's playbook - no charitable interpretations ought apply. As far as I am concerned, no Open Source project has any justification for still being on the platform as of the day of the MS buyout. It's not as though there aren't good alternatives just a git clone away.

link

yladiz 940 days ago

> This (behavioural data) is precisely Microsoft's playbook

What behavioral data can you glean from a code search like Github's? The context is very different than, for example, Google's, so is there really much useful data you can get here?

link

mikro2nd 939 days ago

From a code search in the wild, with no context? Not a lot. From a code search from a person who's logged in, identified? Well, probably still not a lot, but it's another factoid about that person to hang onto the knowledge graph.

link

SnowflakeOnIce 940 days ago

Another factor: anonymous faceted regex search across a huge volume of code allows bad actors to find hardcoded credentials and gain access to additional systems, without a good audit trail.

But yes, there are multiple good explanations for why they would lock down the API.

link