Hacker News new | ask | show | jobs
by izzytcp 1757 days ago
This sounds too good to be true in practice. If I deploy it on a server and continuously query it from all of my devices, will Google ban that serve rIP?
5 comments

I set this up for my family and I a few months ago and we set all our browsers and devices to use it as the default search engine. We've been really happy with it.

I also set up a small script on a cron job that queries random search strings every few minutes and opens the first few hits in selenium. My theory is that if I can't completely stop them from tracking us, I can at least dilute their data with bogus searches.

We haven't had any issues from Google.

I guess it will be just like Searx: https://searx.me/

If an IP generates too much search queries, Google will block / throttle it…

I'm on a shared IP with millions of people using the same public IP (T-Mobile CGNAT), there is one IP (many of them, actually) doing that right now from every T-Mobile customer. Your one server will be a blip on their radar if it even registers.
Dude, thanks for letting me know that everyone uses Google.

I'm talking about static IPs like servers in the cloud, not your home. That stuff is automated and I am sure static IPs get banned but here must be a quota of something.

Dude, CGNAT is handled at the ISP layer, I do not have an IPv4 address at all locally, it's a 464XLAT done on T-Mobile's side. All users come from a shared IPv4 on their network, not mine. Dude.
Dude - Each of those users behind the NAT will have a different set of cookies, user agents, screen sizes, among other fingerprints that qualify them as unique. ISPs also routinely place their CGNAT addresses on specific whitelists so that services don't block them for abuse (you can look through the NANOG email list to find examples of this.) IP addresses are also classified as residential, cloud/server, etc. If Google sees rapid requests from the same IP classified as a server that's sending a Python Requests user-agent, they can absolutely block it.
how long should google block the IP? when the ISP reassigns it to a new home they would be blocked also blocked from google
Obviously I can't answer how long Google should block an abusive IP address since I'm not Google.

A CGNAT IP address is not reassigned to a home, it's shared among many homes. If you meant from my example the cloud server IP, that is one issue that comes up pretty often on cloud services and there's not a clean way around it.

For example I use Linode as my VPN server, I used to have all sorts of trouble with Google making me enter captchas or blocking my search just because of the abuse coming from the same IP range. I actually can't even login to some apps while on my VPN, and I've had this same IP address on my Linode for close to 10 years, so it's not an issue with my /32 specifically.

You'll see the same thing on AWS, many of those ec2 instances can't be used for sending email or for VPN services because previous users of the IP space abused their way onto block lists.

I get "you appear to be a robot" whenever I use my DigitalOcean box as an exit node. I'd imagine you'd have to host this at home or get really lucky there

The moment I switch it on and use Google, no excessive searches etc

So, I set it up this week-end on a vps and subdomain and today it's blocked :(.

    About this page

    Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why did this happen?

    IP address: 137.x.x.x
    Time: 2021-08-30T16:56:45Z
    URL: https://www.google.com/search?gbv=1&q=broken&lr=&hl=en&safe=off
scraping yes. normal usage no.
Isn't normal usage scraping in this case? Looking at whoogle-search/request.py it's "scraping" google urls via the python requests module. I'm reasonably sure google fingerprints requests and assigns different weights for "probably scraping". I wouldn't be surprised if this has a lower threshold for triggering their captchas and/or blocking.
I often trigger Google's captchas during normal usage. It seems to suspect the more "advanced" features like "intitle:" or "inurl:", or if I search too rapidly. I take being mistaken for a machine as a compliment!
That's understandable, a lot of exploit seekers use those features to find exploits e.g. "powered by [cms with known exploit]", Google (and Bing) are definitely are more prone to showing you a captcha for those searches, especially if you're looking beyond the first page.