Hacker News new | ask | show | jobs
by fotcorn 940 days ago
A charitable interpretation might be that search requires a fair amount of compute, and is therefore a big denial of service vector.

I am not sure how much behavioral data GitHub can gather from logged in user, and how useful that is compared to the code that is there anyway. Maybe to figure out which parts of code are important? But that isn't really user-specific.

3 comments

Yes, it's a real problem for anyone offering any sort of search capabilities. Like, about 0.5% of the traffic to my search engine is human. I'm not aware of any search engine that doesn't have similar stats.
Off topic: how do you determine what percentage of search is coming from humans?
Well about 99% of the search requests I got back when I was using cloudflare couldn't get past their bot-mitigation, and of what made it through, at least half looked very automated.
I'm a human and I can't get past cloudflare "bot mitigation" with my browser. Bot mitigation actually just means your browser executing the latest bleeding edge javascript functions to make sure your behavior is monetizable.
No that's not actually true at all. The website always worked with text-only browsers, cloudflare or not. Thoroughly tested with the likes of w3m and dillo.

Virtually all of the traffic that was intercepted claimed to be modern Chrome or Safari or similar, which should be capable of "executing the latest bleeding edge javascript functions".

The primary reason why anyone gets shit from bot mitigation is IP reputation, this is far more important (and effective) than looking at browser characteristics.

Github could require captcha for non-logged in users, I suppose.
I'm a human, yet I am unable to get past Steam's captcha. It is not the only site that I cannot prove to not be a robot. I'm guessing the number of collateral damage is worth it to them. I'm not a big gamer, and wouldn't be a big source of revenue for them anyway.
Steam has a captcha ?
No Dylan604 is a suspected Cylon /s.

Many companies will black-hole you and force infinite captias despite solving them correctly to waste resources.

This (behavioural data) is precisely Microsoft's playbook - no charitable interpretations ought apply. As far as I am concerned, no Open Source project has any justification for still being on the platform as of the day of the MS buyout. It's not as though there aren't good alternatives just a git clone away.
> This (behavioural data) is precisely Microsoft's playbook

What behavioral data can you glean from a code search like Github's? The context is very different than, for example, Google's, so is there really much useful data you can get here?

From a code search in the wild, with no context? Not a lot. From a code search from a person who's logged in, identified? Well, probably still not a lot, but it's another factoid about that person to hang onto the knowledge graph.
Another factor: anonymous faceted regex search across a huge volume of code allows bad actors to find hardcoded credentials and gain access to additional systems, without a good audit trail.

But yes, there are multiple good explanations for why they would lock down the API.