Hacker News new | ask | show | jobs
by throwaway_62022 1253 days ago
I think it is problematic that we are using github issues as "support forum" for asking a git host provider to be excluded from the refresh list. This should not have come to that. Whatever happened to "reasonable defaults", so as a random person hosting a single Go module doesn't get DOSed - https://github.com/golang/go/issues/44577#issuecomment-86087... ?
1 comments

Everyone can make their own assessment of what is a reasonable default and what counts as a DoS (and they are welcome to opt-out of any traffic), but note that 4GB per day is 0.3704 Mbps.
Comes around $8-11 of egress monthly traffic on AWS. I would think twice before signing up for a service that charges me $10/month - not sure why this should be any different.

Also, how do you opt-out? Imagine a random developer in a startup, running a Gitlab instance and then pushing a Go module there and only to be left with inexplicable traffic pattern(and bill). I have no skin in the game but this default _does not_ sound reasonable to me, whichever way you slice it.

As an aside, I have done this and I do not have the problem. The most interesting question for me in all of this has always been, what triggers Google's request load?
AWS gouges the crap out of people with their egress fees, though. You can get 20TB of transfer for ~$5 a month from Hetzner plus a server to go with it.
Opt-out is still manual and undocumented process?

I am very concerned that "own assessment" of what is a DoS means that source code is expected to be hosted only on large platform or by large corporation which is another way to say that "the little guys don't matter".

Self hosting of source code should be an option and the proxy should be there to reduce the traffic load, not amplify or artificially increase that load despite the "level of DoS".

One thing Drew is asking for is to respect robots.txt to allow the operator to determine what a reasonable level is for that operator and not apply a github bias to it.

This way of thinking is the exact opposite of what we need to do in the IT for reducing our impact on the environment. 4 GB is an enormous amount of data. It's enough to listen to 21k hours of music, or having a small dump of all french wikipedia (no pics, main paragraph). It's enough to completely travel off-line in Germany. It's enough to watch 3 to 4 movies in a good resolution. And that's per day.

We must absolutely reduce our resource usage.

> but note that 4GB per day is 0.3704 Mbps

That's per Go repository. That's a non-trivial amount of egress data and probably adds up to thousands of dollars a month.

what? why isn't it only checking the checksum file for changes? :o
That 4GB figure is for a repo at git.lubar.me, a self-hosted git repo where – quoting the person running it – "I am the only person in the world using this Go module".

In this context, that seems like a lot. Of course the module mirror can't know about this context, but there are certainly a lot of scenarios where this is comparatively a lot of bandwidth. Not everyone is running beefy servers.

Seems like an exceedingly poor and unreasonable default, and it doesn't take much imagination to see how this could be improved fairly easily (e.g. scale to number of actual go gets would already be an improvement).

> A single module can produce as much as 4 GiB of daily traffic from Google.

That’s (upper bound) 4Gib times 2500 per hour. That’s not nothing.

Per Go module repository, no?