In case someone cares about these things, I compared the build times and the binary sizes for 1.9 vs 1.8.3 using the open source project we maintain [1]. This is on a 6-core i7-5280K:
Build time with 1.8.3:
real 0m7.533s
user 0m36.913s
sys 0m2.856s
Build time with 1.9:
real 0m6.830s
user 0m35.082s
sys 0m2.384s
Binary size:
1.8.3 : 19929736 bytes
1.9 : 20004424 bytes
So... looks like the multi-threaded compilation indeed delivers better build times, but the binary size has increased slightly.
Unless you perform a proper statistical analysis it's unfair to draw a conclusion from a single run.
Furthermore, when I see a second run that's faster than the first one, I immediately wonder if it's the cache being cold for the first run and warm for the second.
In fairness, the phrase he used was "looks like". I don't think his comment was intended to suggest that he'd done rigorous and exhaustive wide-spectrum analysis of compile times and executable size, just that expectations matched the result for his project.
Thanks :) I'm no stranger to the scrutiny of Hacker News, I did 3 builds in a row and threw out the 1st one (cache), the last two were within 0.1s of each other, so I copied & pasted the latter.
"Programmers Need To Learn Statistics Or I Will Kill Them All"... What an insufferable asshat.
PSA: There is no reason to behave like this and this is an incredible way to alienate a bunch of people. You either offend people directly with the murder implication or they don't take you seriously because you sound like you're throwing such an extended temper tantrum that you managed to write it all in a blog.
how, concretely, should I go about doing this particular analyzis of compile time for one project ? How many times should I run the build for each of the 2 compilers and what should I do with the result so I could; 1. Draw a conclusion 2. Come up with fair numbers of how they compare ?
I would hope someone could tech this hopefully simple and very concrete thing to the HN crowd and I do hope the answer is not "go learn statistics".
You need to first create a clean slate each time for running the experiment: no cache, no FILESYSTEM cache etc. Maybe a tonne of single use docker images? Even then filesystem caches will mess you up a little.
Beyond that, you need to run the same build "several" times to see what the variance is. Without getting specific, if the builds are within a couple percent of each other, do "a few" and take the mean. If they're all over the place do "lots" and only stop once the mean stabilises. There are specific methods to define "lots" and "a few" but it's usually obvious for large effects and you don't need to worry too much about it.
If you're trying to prove that you've made a 0.1 improvement on an underlying process that is normally distributed with a stddev of, like 2, then you're going to have to run it a lot and do some maths to show when to stop and accept the result.
I want measurements with filesystem cache because I'm interested in estimating the speed of the compile-test-edit cycle. If you want to estimate the impact on emerge then you'll want no filesystem cache.
It's all about measuring based on what you intend to use the measurements for.
If the measurements are all over the place, why not take the fastest? The average is no good, because it'll be influenced by the times it wasn't running as fast as possible.
I don't myself lose much sleep over worrying about the times it runs faster than possible.
I agree with this sentiment. Any time worse than the fastest is due to noise in the system (schedulers etc). So the fastest is the lowest noise run.
Of course, as I said in another comment it depends what you want to do with the measurement. If you plan to edit how long a run will take on an existing system, then you need to accept the noise and use the mean (or median).
Personally I think it's a better idea to instrument your programs and count the number of memory (block) accesses or something. That metric might actually be useful to a reader a few years in the future. The fact that your program was running faster on a modern x86 processor from the year 2010 tells me nothing about how it would perform today, unless the difference was so large that you never needed statistical testing in the first place...
Yes, the other guys are just being pedantic because libc is attempted loaded dynamically (but it is not required—DNS behaviour just may change without it).
Well, Go code is statically linked, but the runtime may try to dynamically load libc for DNS resolving. Use of cgo of course drastically change everything.
For a few things like system DNS resolver in net package (can be switched to the pure Go version with compile-time or run-time switch) and getting user's home directory in os/user package.
Yes indeed this is totally awesome. It’s a problem that occurs on any platform, and not only for testing. I often see this problem with logging as well, where some validation/helper function logs an error separate from the context it occurred in, potentially making it hard to trace without a stacktrace logged as well.
This is the biggest problem with Go errors and one of my biggest gripes with the language. Exceptions have stacktraces that give you context about where the error originated. Go errors don't have this and it costs me a lot of time debugging things.https://godoc.org/github.com/pkg/errors helps, but it's still more of a pain than it should be.
Except you don't get to control the specific error type returned by packages you import. So sure, you could get stack traces for your code, but not for your dependencies.
It infuriates me when go proponents try and sweep bad language decisions under the rug with half-fixes.
To be fair, the Go authors have always strongly promoted that you fork and maintain your dependencies. That may be a bad operational decision (outside of Google), but is technically unrelated to the language itself.
Mind you, if third party code needs debugging, you're going to have to fork it in order to apply your fixes in a timely manner anyway. Perhaps their stance is not as crazy as it may originally seem.
Which to be clear, is a language design mistake. Go would be a much more productive and friendly language if errors would just have stack traces on them when they're created. There's no reason to have the poor developer adding dozens of debug prints all over his (and sometimes third-party) code manually into every place that could have returned the error just to try and discern where it comes from. Or grepping for strings in the error message, or throwing darts at printouts of the source. That's just a massive step backwards from what we've had since the 80s and 90s. I know Go prides itself on being simple, but really the errors are too simple and we're all worse off because of it.
The problem is that Go errors are just a convention, they are not a feature of Go. It's just an interface, it could have been called FOO it wouldn't have made a difference. Go has exceptions, they are called "panic" by they are inferior to Java in the way they are handled.
Nice can't wait to run some of our benchmarks against this. Go has the awesome property of always becoming a little bit faster every release. It's like your code becomes better without doing anything. Love it :)
> Type enforcement can be static, catching potential errors at compile time, or dynamic, associating type information with values at run-time and consulting them as needed to detect imminent errors, or a combination of both.
interface{} is type-checked at runtime. It's type-safe because because you can't e.g. fish out an integer out of interface{} value that represents a string. Runtime won't allow it.
You can either extract a string or the runtime will crash if you insist on extracting anything else. Unless you use unsafe package, in which case you explicitly want to skip type-safety.
When "type safe" is mentioned without qualification, it almost always refers to static type safety. This is one of those times. So no, the sync.Map container is not typesafe like a regular map is.
"Static type safety" will generally mean type safety established at compile time. In this context, "static" tends to mean "compile time" and "dynamic" tends to mean "runtime".
For example, one might say "static code analysis" to mean analyzing code without running it, such as during a phase during compilation. In contrast, "dynamic code analysis" tends to mean actually running the code and making decisions based on what happens at runtime, such as in JIT (https://en.wikipedia.org/wiki/Just-in-time_compilation) techniques that identify hotspots.
According to whom? Certainly not according to wikipedia.
Given that it's a pretty big distinction I would think it's on the speaker to be un-ambiguous and say "it's not statically type safe" vs. ambiguous "type safe".
I've certainly seen my share of people claiming that "interface{} is just like void * in C" when they speak about Go's (lack of) type safety.
I also don't see how insisting on accurate and un-ambiguous terminology ticks people off so much to downvote. I imagine they think I said something much more incorrect than I did.
"type safe" has come to mean "statically type safe" over time in common conversation. You were downvoted because this intended usage was clear to the people who downvoted you and their perception of your comment was that it provided no value, e.g. was a nitpick.
I would note that the Wikipedia page does not take as strong a position as you seem to imply, reading:
> In the context of static (compile-time) type systems, type safety usually involves (among other things) a guarantee that the eventual value of any expression will be a legitimate member of that expression's static type. The precise requirement is more subtle than this — see, for example, subtype and polymorphism for complications.
Since golang is statically typed, type safety is generally understood to mean static type safety.
Well, non-type safety isn't really worth discussing. The big differentiators these days are: does it fail at compile time or run time? This is emphatically the latter category.
I googled a little bit and found some good info, I guess I had forgotten a little bit of the concepts of mutex fairness/unfairness. I found a very nice explanation on cs.stackexchange:
"My understanding is that most popular implementations of a mutex (e.g. std::mutex in C++) do not guarantee fairness -- that is, they do not guarantee that in instances of contention, the lock will be acquired by threads in the order that they called lock(). In fact, it is even possible (although hopefully uncommon) that in cases of high contention, some of the threads waiting to acquire the mutex might never acquire it."
With that computer science clarification, I think the comment "Mutex is now more fair" and the detailed description "Unfair wait time is now limited to 1ms" makes it a lot clearer.
Great improvement I think! It's one of those things that you don't notice until you have a bug, but it's really nice to never get that bug in the first place. =)
I was curious so I downloaded Go 1.4, and tested against 1.9 on an old version of a project I have (backend for Mafia Watch), about 30K lines of Go, including a few 3rd party dependencies.
Go 1.4: Around 2.1s
Go 1.9: Around 2.5s
So within 20% of 1.4, not bad. That's on an old MacBook Air, dual core 1.7 GHz i7, 8GB ram.
And of course the binary performance and GC pause times w/ 1.9 will be much better.
Dave posted one about halfway through the release cycle, showing a small improvement. That was before the parallel-function compilation though, so things might have gotten better since then.
This has been discussed and the discussion derailed very quickly ("Go is a joke, my language has it bigger, blah blah".)
Reality is that the Linux Kernel makes a big confusion between processes and threads in the userspace APIs.
Locking to threads is a solution that works but also sucks and defeats the niceties of Go N:M model. But that's the only way: if you use that broken system calls API you should know better.
So this new concurrent map? Am I right in understanding it's designed for cases where you have a map shared between goroutines but where each goroutine essentially owns some subset of the keys in the map?
So basically it's designed for cases like 'I have N goroutines and each one owns 1/N keys'?
Also, do you have any good references to proper best practices around concurrent and parallel programming? (in Go.) Like just basic things. Code I can copy and paste without it having obscure race conditions because that use of mutex is absolutely correct, and something that lets me understand the limitations. I feel like it is very easy to do things "wrong" or not notice some edge cases. In C++ I didn't only ever coded single-threaded for this reason. Too many gotchas. Any help would be appreciated.
Necessary for reads unless no additional writing will be done. If you initialize/write into a map and then later have concurrent reads from it, the program will run. If you try to write in the midst of this, it will crash.
A concurrent map in my understanding, is a map that can be accessed concurrently without explicit synchronization, not each coroutine has a piece of it. Check java ConcurrentHashMap.
Now make that a compile time thing that happens on imports and that doesn't generate temporary source files and whoop you have modules with generics. Oops, I forgot that there are some unsolvable obstacles to be resolved.
I have been wondering for quite some time how one could implement that. That is, not in the sense of how to code that, but rather how few changes one would have to make to the language to get as many effects as possible in the direction of genericity. There is some inspiration in Lua, Scheme48 and some dialects of ML (functors, I believe?) in the form of "higher-order modules", where the module (would be package in Go?) could have parameters that you'd have to supply when importing it. The things you could obviously supply would be at least constants, functions and types. (One might look at functions as types of computational processes, though, and at function signatures/interfaces as their respective type classes. This perspective could subsume functions as types, and perhaps integers as nullary functions returning an integer.) The question is how to reasonably do the import strings. Good thing about Go is that you already have provisions in the language in the sense that the string can be technically arbitrary. A subset of the reflection interface could additionally be evaluated at compile time to provide for ad-hoc specializations of generic code by writing straightforward code that would be easily eliminated/specialized in a module instantiation (like loops over struct fields etc.)
The interface to the feature is perhaps more important than the complexity of the implementation because it will affect many more people - only a few programmers will work on the compiler but tens of thousands of programmers will be writing code using it. I make no claims as to how complex this would be to implement, but it probably wouldn't stand out. The interesting thing is that this shouldn't necessitate any changes in the language of generic modules (no <>s and such). It merely parameterizes some types and constants in a module. As such, after a certain phase in compilation, the process is the same as for a non-generic module so perhaps it's a low-complexity change in the implementation, too (not just in the language spec).
Build time with 1.8.3:
Build time with 1.9: Binary size: So... looks like the multi-threaded compilation indeed delivers better build times, but the binary size has increased slightly.[1] You can git-clone and try yourself: https://github.com/gravitational/teleport