| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by johnduhart 1489 days ago

> I mean, you didn't even consider implementing a simple fetch of an already cloned repository in your mirroring server code. So yeah, I'd argue that the bad faith part is actually justified.

https://github.com/golang/go/issues/44577#issuecomment-11378...

> We did consider caching clones, but it has security implications and adds complexity, so we decided not to. It is certainly not trivial to do and not something we are likely to do based on this issue.

Drew continues to act as though he is always correct, and any viewpoint that isn't his is just moronic. I've repeatedly seen this behavior from him in multiple venues over the years, and I'm happy to see the wider community start calling this out as childish.

4 comments

ziml77 1489 days ago

I don't particularly care for Drew, but the issue he's reported here seems totally valid. And if he requested that he be excluded from getting hit by the crawler, wouldn't that mean it would be impossible for people to use packages from sr.ht unless they change their config?

Plus, it does seem reasonable to think that only one of the crawlers needs to hit the site. The global replication can happen at the FS level or, heck, the crawlers can just perform pulls from each other.

tptacek 1489 days ago

No. According to the Go project, adding his site to the exclusion list would reduce traffic to his site at the cost of freshness of the data the proxy collects; it would not make it "impossible" for people to use packages from sr.ht.

This is all in the thread that DeVault linked to from his post.

tete 1488 days ago

Which would still be far from great for any kind of source hosting website.

tptacek 1488 days ago

In what way would it be "far from great"?

tete 1488 days ago

Well, why have that proxy/functionality in first place if the best option is to disable it?

tptacek 1488 days ago

I don't even understand the question you're asking. Nobody is suggesting the proxy be disabled, including for sr.ht.

It's OK not to know the specifics of what this is about, but it's weird to have strong opinions about it if you don't.

AlotOfReading 1489 days ago

Drew can often be very abrasive, but does it really matter in this case? His site is basically being DDoS'd.

Yes, there are decent arguments why the golang infra doesn't cache or respect typical norms like robots.txt, but they don't change the unreasonableness of the underlying situation. Surely some mitigation could have been worked out in the year since the ticket was filed?

res0nat0r 1489 days ago

They offered to turn off refreshing of his domain it appears on Jun 8, 2021: https://github.com/golang/go/issues/44577#issuecomment-85692...

prepend 1489 days ago

That doesn’t seem like a solution at all and is actually kind of punative as that would make srht bad for hosting go.

I think this is just an example of Google being a jerk and not caring enough to do proper software engineering.

Go seems really interesting but I have avoided using it because it’s so tied to Google. And I don’t trust Google to make good decisions for developers or users.

lupire 1489 days ago

It looks like a solution to me: Google stops proactive refreshing, and so users get data that is fresh up to the cache timeout.

Users who can't wait that long can disable the proxy, and SourceHut can recommend users do that.

opmac 1488 days ago

Perfect succinct response. It is a 100% viable workaround.

tptacek 1489 days ago

Can you articulate why it isn't a solution, and how it would be punitive? There are people on this thread who appear to believe Google's workaround would mean that repositories hosted on sr.ht would be unusable as Go modules, which is not at all the case.

rurban 1489 days ago

drew articulated it very well why google's offer doesn't help at all.

https://github.com/golang/go/issues/44577#issuecomment-85693...

A full git clone just to DDOS a hoster to check if the user-experience is still first-class, and filling a proxy is not an acceptable solution for a module hoster who has the pay the hosting bills by himself.

If they want to know if their proxy is still uptodate, a cheap latest change request 8x/hour would be appropriate.

> Have you considered the robots.txt approach, which would simply allow the sysadmin to tune the rate at which you will scrape their service? The best option puts the controls in the hands of the sysadmins you're affecting. This is what the rest of the internet does.

> Also, this probably isn't what you want to hear, but maybe the proxy is a bad idea in the first place. For my part, I use GOPROXY=direct for privacy/cache-breaking reasons, and I have found that many Go projects actually have broken dependencies that are only held up because they're in the Go proxy cache — which is an accident waiting to happen. Privacy concerns, engineering problems like this, and DDoSing hosting providers, this doesn't looks like the best rep sheet for GOPROXY in general.

tptacek 1488 days ago

You didn't answer my question. What's the problem with the Go team's workaround? I get that DeVault would like to redesign the Go modules system to suit his own preferences, but that's not on the table.

ocdtrekkie 1489 days ago

In this case, it's really hard to see thrashing other people's servers relentlessly to collect data you already have as anything but incredibly, incredibly poor engineering. Y'all should write him a check for that much resource waste.

rasz 1489 days ago

This comes to mind: https://news.ycombinator.com/item?id=31496063

>At Google we were told to stop thinking about all this stuff, that the storage hardware and software people were responsible for hiding things like wearout from application developers.

Something tells me this team was told to "stop thinking about all this stuff, that the network people were responsible for hiding things like speed, latency and cost from application developers." aka network is infinite, keep pounding that repo and we will scale accordingly (our side of the equation, sucks to be other people)

adamrezich 1489 days ago

without knowing anything about this situation outside of this thread and the post it links to, it comes across as willful negligence to screw over someone who was a bother in past community transgressions

tptacek 1489 days ago

That's a risible suggestion. Even DeVault doesn't say that.

verdverm 1489 days ago

It's less than $200 per month to send out 4G daily. If his business can't afford that, there is something else going on.

What is the total daily bandwidth that sourcehut uses anyway? What percentage is go module fetching?

cycomanic 1489 days ago

The 4G daily was a different user who hosted a go module where he was the single user on his own server, this was not DeVault.

I'd be pretty pissed if I hosted a go module essentially for myself and suddenly I have a $200 dollar bill, because google decided to clone my repository 500 times a day. If it doesn't bother you, how about you donate $200 a month to a charity of my choosing, because it doesn't matter to you.

verdverm 1489 days ago

Self hosting costs money, for this one user it would seem the options of blocking or other options are more tenable

If money was a problem, I'd expect this individual to ha e rectified it on their end

cycomanic 1489 days ago

So tell me why do people use DDoS protection? It's just money. If you run a server you should be able to eat all the cost!

Seriously do you follow through what your arguments actually mean if applied in general?

verdverm 1488 days ago

There was no actual DDoS, so no need to compare

Should every language be responsible for paying the bandwidth bills for dependencies?

You might look at the most recent comment from the Go team on the issue, there have been no additional requests or events since they last resolved it for both of the effected parties

nirvdrum 1489 days ago

Plenty of bootstrapped businesses have better things to spend $200 / month on, let alone the time spent trying to figure out where all the anomalous traffic is coming from. As I understand it, it's not simple file fetches either. It's cloning a repo, which involves two-way communication, consumes CPU and RAM, and causes disk seeks. You're not slapping it on CloudFront and calling it a day. Finally, it looks to me like the costs are going to scale the more people he has using sourcehut and writing Go modules.

I don't really understand turning this around on him. Why should he have to subsidize Google? If it's not a problem, why do we have robots.txt at all? Just let bots hammer your site and cope with it.

The current situation can't be the optimal solution. It wasn't even present prior to Go 1.16. Only one company has the ability to change that. What should he do differently here? Why should he have to spend any of his time or money working around an issue he didn't create?

ocdtrekkie 1489 days ago

That was a different user. The fact that a user not running a git hosting service is potentially eating $200 a month should queue you into the fact that the cost to Drew is likely drastically higher than that.

Google should be sending reimbursement checks for the damage done here on this issue.

verdverm 1489 days ago

Drew is running a code hosting business and this is a cost of providing a feature to the users. He can pass the costs on if it is a problem. He has lots of options and his competitors are not making a big deal out of this.

I suspect he's drawn his line in the sand and wants to keep it going rather than finding a solution that works without requiring upstream changes.

Arch-TK 1489 days ago

If I provide a paid service and someone abuses it I must deal with it because my larger competitors deal with it? It's good to know that small businesses have no place in the modern world.

stefan_ 1489 days ago

> but it has security implications and adds complexity

Read: we prefer to use your servers for caching. Not good enough. Maybe the issue is people making silly evasive arguments like these while the server load piles on?