Hacker News new | ask | show | jobs
by PaulHoule 530 days ago
In the early 2000s I was working at a place that Google wanted to crawl so bad that they gave us a hotline number to crawl if their crawler was giving us problems.

We were told at that time that the "robots.txt" enforcement was the one thing they had that wasn't fully distributed, it's a devilishly difficult thing to implement.

It boggles my mind that people with the kind of budget that some of these people have are struggling to implement crawling right 20 years later tough. It's nice those folks got a rebate.

One of the problems why people are testy today is that you pay by the GB w/ cloud providers; about 10 years ago I kicked out the sinosphere crawlers like Baidu because they were generating like 40% of the traffic on my site crawling over and over again and not sending even a single referrer.

1 comments

I've found Googlebot has gotten a bit wonky lately. 10X the usual crawl rate and

- they don't respect the Crawl-Delay directive

- google search console reports 429s as 500s

https://developers.google.com/search/docs/crawling-indexing/...

I have found google severely declining in engineering quality. On January 8th 2025, they stopped accepting JCB credit cards, and emailed customers that their payment info was invalid and would be suspended (search twitter for examples in japanese). Seems it was a bug, without any explanation to customers receiving the notification, opening a ticket resulted in it being closed immediately while being lied to (my only guess is they wanted to increase their metrics). How was this not quality checked in the first place? I guess google has the policy of recording the chat transcript (where lies are recorded), but it means nothing when the company doesn't care. I don't like it, but aws seems the next logical place to move business to. As far as I can tell, the support there is real.