| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by edsrzf 3278 days ago

I normally hate it when people immediately trot out the old "premature optimization" quote, but it really applies here.

Please don't go around naming all your returns just because today's compiler happens to generate better code with them. This is a compiler issue that I'm confident will be fixed one day, especially if you do the right thing and file an issue.

But by all means, if you're profiling and your inner loops are actually slowed down by this, then make the change. And add a comment so that someone might be able to change it back some day when the compiler's improved.

3 comments

bradfitz 3277 days ago

I filed a Go bug: https://github.com/golang/go/issues/20859

We should just fix the compiler.

runeks 3277 days ago

I'm surprised Go doesn't compile down to an IR language where these differences in syntax are represented in a single manner. Seems like different ways to write the same thing.

bradfitz 3277 days ago

It increasingly does.

But it's been a process.

Go 1.5 was the first self-hosting release, with the Go compiler auto-translated from C to Go. But it was still fundamentally Ken's C compiler in Go syntax.

Every release since (Go 1.6, Go 1.7, Go 1.8, Go 1.9) has been cleaning it up and making it more Go like and less C like.

Meanwhile, the backend was also retrofitted. Go 1.7 included an SSA backend for amd64 (https://golang.org/doc/go1.7#compiler).

Go 1.8 included it for all architectures and added more SSA goodness.

Go 1.9 adds yet more, but some things are still not pushed down into SSA as well as they could be. (e.g. https://github.com/golang/go/issues/5147#issuecomment-247685...)

Nowadays we can add optimizations much more easily, including writing matching rules like in https://github.com/golang/go/blob/master/src/cmd/compile/int...

Meanwhile the whole toolchain keeps getting cleaned up and more hackable.

In Go 1.9, the compiler is now parallelized, which would've been impossible earlier. (https://tip.golang.org/doc/go1.9#parallel-compile)

So, it keeps improving. Just remember the Hello World compiler we started with.

Also amusing in retrospect is that when Go first came out, despite having a very basic compiler at the time, people coming from scripting languages thought we were so fast.

pcwalton 3277 days ago

Just for what it's worth, the way I'd fix that problem in the compiler would be to implement dead store elimination via global value numbering. With trivial alias analysis, the compiler would be able to detect that the result of the "duffzero" instruction (which I assume is a memset) is always killed by the "duffcopy" instructions and would eliminate it.

See this article for how it's done in LLVM: http://blog.llvm.org/2009/12/introduction-to-load-eliminatio...

heythere124 3277 days ago

Not my area of expertise and it is yours, but if you eliminate a zero instruction before a copy instruction how can you be sure that doesn't affect other threads?

  var x Int
  // Pass x to a thread by reference
  x = 0
  time.Sleep(1000)
  x = 1

the8472 3277 days ago

How can you be sure that the other threads ever see it in time? they might be suspended for a whole second because a HDD needs to spin up or something like that.

So threads never seeing the value is already a valid outcome, so the compiler might as well always do that.

Filligree 3277 days ago

The answer to that one would be to embed thread-safety in the type system, aka. Rust.

For languages with less sophisticated type systems you get a choice between inefficiency (Go), or complicated rules which state that the programmer is wrong for coding that way (C).

pcwalton 3277 days ago

The memory dependence analysis must prove the memory is unaliased, which ensures among other things that no other thread can have a reference to it. Presumably in Go return pointers are guaranteed to be unaliased.

dom0 3277 days ago

Why does Go not use LLVM? Are there technical reasons to reinvent the wheel, or is it just because LLVM is Apple's pet?

_pctq 3277 days ago

I hope this won't come out harsher than I intend to, but I'm so tired to hear this expression "not reinventing the wheel" to justify using third party code. This is not what it means.

Note that there is not a single wheel that was built once in prehistory and now every human gets it lent when they need it. People build wheels everyday to fit their needs, reusing the concept of wheel, that is, knowing that a circular object allows for smooth movements with less friction. The analogy in software development means that you've better know of designs that help you solve your problem, not that you should blindly use code built by someone else to bypass the whole problem solving. This is basically trying to use a bicycle wheel for everything. This may work well on an other bicycle, not on a car.

bradfitz 3277 days ago

See this answer from Russ Cox: https://news.ycombinator.com/item?id=8817990

dom0 3277 days ago

Thank you for an insightful answer.

pkroll 3277 days ago

First part of the "Implementation" section: they thought it was too large and slow for their compiler speed goals.

https://golang.org/doc/faq#Implementation

weberc2 3277 days ago

I'm pretty sure Go does compile down to an IR, but it's little more than an abstraction over different architectures. I could be wrong.

hornetblack 3277 days ago

Originally it compiled to Plan9 assembly which is cross platform. They have an SSA backend for some architectures now.

weberc2 3277 days ago

I thought the SSA backend was not replacing the Plan9 assembly, but that it was a phase that happened before the assembly was output (presumably SSA is a phase and not an IR?).

hornetblack 3272 days ago

Oh. That actually make more sense.

noselasd 3277 days ago

It does if you use gcc though - both variants of the code in question here compiles to the same assembly (at least with gcc 7.1 on x86-84)

wlll 3277 days ago

I think accusations of premature optimisation might be a little unfair here.

Ignoring the style issues for a second (I'll pick that up later), if I'm looking at some code and there are two equally viable alternative ways of writing it, one of which saves a chunk of memory* or is faster then it's just perverse to choose the path of larger/slower code. I do this with regular expressions/string functions. I see people use regexes a lot, but the tool I reach for first when doing string operations are the built-in string functions, eg. https://golang.org/pkg/strings/#Contains or https://ruby-doc.org/core-2.4.0/String.html#method-i-start_w.... I'm not optimising, I'm just not de-optimising.

Back to the style issue, at this point if you really feel strongly about the way it looks in the editor, or in documentation then I can see why you would choose one method over the other, choosing the less desirable, but theoretically faster code would absolutely be a case of premature optimisation. I personally don't have a particular preference either way however, and feel like the reasons outlined in the style guide are rather fragile. So ultimately, if I pick up some code full of named return values I don't think it would bother me, in the same way that code that uses none, or mixes them where someone thought it appropriate doesn't bother me either.

* There may be benefits other than just disk/distribution size. Many years ago I read about the benefits of small binaries, something relating to CPU caches, though that may be out of date now and I forget the details.

hyperpape 3277 days ago

The issue is that with the string/regex issue, there's a good reason that the string operation should be faster. Maybe a "sufficiently smart compiler" could optimize it in some cases of static regexes, but it's at least complicated. In cases where the regex is dynamically determined, even the sufficiently smart compiler probably can't optimize away the regex.

In contrast, the case in this article seems like table stakes for an optimizing compiler. It's just not eliminating common subexpressions. There's no reason to contort your code around something that should automatically happen.

wlll 3277 days ago

When I've benchmarked regexes before (Ruby ISTR) there have been situations where the regex engine optimised the code to be comparable to string functions, I can't remember the details though. Just for fun I benchmarked =~ /\Asomething/ and start_with?("something"), a situation that seems could be optimisable by the regex engine, but the string function is still faster (Ruby 2.3.1):

start_with: 5703143.1 i/s regex: 2821224.4 i/s - 2.02x slower

That aside:

> There's no reason to contort your code around something that should automatically happen.

Absolutely, but it's a question of style at that point. "Contorting your code" suggests using a less desirable style/syntax for some gain, and I'd agree would be premature optimisation. If you're just making a choice between two styles that you consider to be pretty much equal then it's just pragmatic.

edit: Forgot to add, yes, I agree that this should happen automatically in the compiler :)

hyperpape 3277 days ago

My point of view is that if you're two equally good styles based on the compiler, it's contorting your code. One doesn't have to be better than the other, if you can't make that choice between semantically identical options, you're conforming to an arbitrary standard.

obstinate 3278 days ago

I dunno. A 30% code size win is non-trivial. I'm all for filing an issue first and seeing how likely it is that there is uptake from the dev team and desire to fix the problem. However, if no fix is forthcoming . . . code size has fairly well known effects on performance.

gizmo686 3277 days ago

A 30% code size reduction in code that does little other than construct and return a value. I have certainly seen individual functions where this is the case, but across an entire program, you will not get anywhere near 30% size reduction.

Having said that, this is certainly something that should be fixed in the compiler.

On a related note, in the final assembly, the compiler could also have optimized the 4 RETs into 1, then optimized away all of the conditionals, turning the sample code into the equivilent of "return objectInfo()"). Of course, in a real example, these optimizations would not be possible; but they do show that these reduced cases are not the best way of benchmarking performance.

lobster_johnson 3277 days ago

Fixing it in the compiler will fix it for everyone, however, which is a big argument for fixing it upstream.

pjmlp 3277 days ago

Unless it actually impacts the use case of the application, and has been confirmed by a profiler that is indeed the case, it is just cargo cult optimizations.