Hacker News new | ask | show | jobs
by returningfory2 1891 days ago
In this case, what is the point the blog post is trying to make?

The title of the post is "Go Executable Files Are Still Getting Larger". Upon further reading and conversation here it seems this is possibly not true, nor what the post is about. If we believe Russ's comments, Go executable sizes haven't increased much in general. Perhaps the reason you're seeing increases in Cockroach DB is because you keep writing more code for Cockroach DB?

Now the point has shifted to this notion of "dark bytes". So the article is about ... how the way you previously diagnosed the contents of binaries doesn't work anymore? That's fine and legitimate, but it seems like the point is over-extrapolated to become a criticism of the Go team.

1 comments

> Go executable sizes haven't increased much in general.

Russ's example was just the "gofmt" program.

> Perhaps the reason you're seeing increases in Cockroach DB is because you keep writing more code for Cockroach DB?

If that was the only reason, then the % overhead would remain constant-ish. But it is increasing. So there is a non-linear factor for _some_ go programs (like cockroachdb) and it's still unclear what that factor is.

It's not clear that the overhead is due to Go itself producing bigger binaries over time though. If you recompiled all the different CockroachDB versions with Go 1.8 (if that was feasible), it's quite probable that the tables you would end up with would look fairly similar to the ones you're actually showing.

If there is superlinear growth in binary sizes as the project grows – for example, if some part is O(n^2) in the number of interfaces – then that's certainly interesting. If you demonstrated that such superlinear growth is happening, and wrote an article based on that, people wouldn't be so critical.

If Go binaries are getting bigger because Go produces bigger binaries for the same source code over time, then that's also interesting. If you demonstrated that Go binaries are getting more and more bloated over time for the same source code, and wrote an article based on that, people wouldn't be critical.

But as it is, you kind of just complained that CockroachDB is getting bigger, tried to blame it partly on the Go compiler producing more bloated code over time, partly on a mystical "dark area" which you don't understand, you mentioned superlinear growth only in the comment section, and you didn't actually gather data or do experiments to prove or disprove any of the things you're claiming as a cause. That's why people are complaining.

> tried to blame it partly on the Go compiler producing more bloated code over time

Where? The argument is _precisely_ that the growth is occurring in non-code areas.

> partly on a mystical "dark area" which you don't understand

The _observation_ is that the growth is happening in an area of the file that's not accounted for in the symtable. That's what makes it "dark". It's not mythical: it's _there_ and you can observe it just as well as anyone else.

> you mentioned superlinear growth only in the comment section

it's in the reported measurements.

> and you didn't actually gather data or do experiments to prove or disprove any of the things you're claiming as a cause

The analysis is stating observations and reporting that the size is increasingly due to non-accounted data. That observation is substantiated by measurements. There's no claim of cause in the text!

> Where? The argument is _precisely_ that the growth is occurring in non-code areas.

But how is this important? If the thing you're optimizing for is "total Go binary size", then all that matters is the total size of binary! How bytes are organized internally is irrelevant to this metric.

You should redo the analysis where you compile an old version of Cockroach DB (say v1.0.0) with Go versions 1.8 through 1.16, and then see what the numbers say. Your current analysis, which doesn't account for growth in the code base at all, or tries to account for it by deep-diving into the internal organization of the binary, is not sound.

> all that matters is the total size of binary! How bytes are organized internally is irrelevant to this metric.

Not quite so if the task is to work on reducing the metric.

When the size is attributed to data/code that's linked to the source code, then we know how to reduce the final file size (by removing data/code from the source code, or reducing them).

When the size is non-attributed and/or non-explained (“dark”) we are lacking a control to make the size smaller over time.

You keep saying it’s unexplained as if it’s intentionally kept secret. You pretend you have no control over it, but if you reduced your own source code, you would find that the “dark” space shrunk.

The Go source code is available to you. Russ has pointed out there’s no existing tool to break down those “dark” bytes but that they do serve a purpose, but perhaps you could work on that tool instead of complaining that it’s not covered by the symbol table.

Hmm, I think I understand where we have a misunderstanding. I - and presumably many others - interpreted the article to make the claim that newer Go versions are producing more bloated Go executables. There are multiple parts of the article which can be read that way. But if you're not doing that, and you're just trying to investigate why the CockroachDB binary is getting bigger over time, then that's a different matter.

I'm not going to respond point by point because those points are kind of moot if my accusations were based on an incorrect reading.

CockroachDB has always had a reputation of being slow, 10x-20x slower than the same operation being made in Postgres, with this and the issues about binary size, was a GC language like Go the right choice for CK? Would you have pick something else today if starting a new?
My intuition is that later versions of crdb are more like 1/3rd the efficiency of Postgres per core. GC is some of that but I don’t think it’s all that much.

Everything has trade offs. Go is not the easiest language in which to write highly efficient code. Cockroach generates some code to help here. Certainly at this point there’s pain around tracking memory usage so as to not OOM and there’s pain around just controlling scheduling priorities. But then again, had it been C++ or Rust perhaps getting to table stakes correctness would have taken so long it wouldn’t have mattered.

Some cost just comes from running distributed replication and concurrency control. That’s unavoidable. Some also comes from lack of optimization. Postgres has been around and has some very optimized things in its execution engineer.

Also, if you run Postgres in SERIALIZABLE, it actually does quite badly, largely because that isolation level was bolted on and the concurrency control isn’t optimized for it. Crdb was core-for-core competitive in serializable on some workloads last time I checked.

For being somewhat familiar with the CockroachDB project, I doubt that that claimed performance difference is linked to the programming language. It's more something about mandatory 3-way (or more) replication upon every write, and several additional layers of protection against hardware failures, network problems etc which postgres do not have.