Hacker News new | ask | show | jobs
by acln 2563 days ago
The most important property of Go for me is that the language is not red/blue. [1]

This enables I/O interfaces to be truly universal, covering just about everything from files on disk, to pipes, to in-memory buffers, to sockets. This feature facilitates a style of, for lack of a better word, generic programming that is hard to come by in other systems.

I find basically any other language or platform except Go lacking and unpleasant in this regard, due to the viral nature of asynchronous functions, which never disappears entirely, no matter how much syntactic sugar is sprinkled on top of it.

Things like mismatch between event reactors or asynchronous frameworks do not exist in Go. Interfaces Just Work, and the entire ecosystem uses them.

[1] http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y...

6 comments

> The most important property of Go for me is that the language is not red/blue.

This initially attracted me to Go as well. Unfortunately in production apps your functions get colored by their `context.Context` argument to support cancellation. Unfortunately `Context` is viral because it needs to be passed down from `main` down to virtually all blocking functions.

I've never been a fan of the "what color is your function" essay, because it implies that Go is in some sort of unique space. In fact, Go just uses threads. There's no semantic difference between Go and pthreads. The only difference is that Go has a particularly idiosyncratic userland implementation of them.
While "what colour is your function" essay highlights the author's pet peeve, asynchronous functions, I always understood it to be less about the underlying implementation, but about syntax and semantics; the whole point is that control flow ends up infecting function-level semantics. That problem extends to anything else a language can treat at "coloured".

For example, in Haskell, side-effectful operations end up being "infected" with the IO monad. This means you're not free to mix and match functions — the moment you need to call some IO function, all callers up the stack need to be monadified, too. This might be a late change — suddenly you need a logger or a random number generator, and it has to be passed all the way up from the outermost point that uses monads. In practice, monads are so deeply ingrained in Haskell now that most devs probably don't see this as a colour problem.

Multi-value-returning functions in Go is another example of colour. The only way to use the return value of a Go function that returns a tuple is to assign them:

  value, err := saveFile()
  if err != nil { ... }
This means functions like these aren't composable. I can't do saveFile().then(success).fail(exit) or whatever, like you can in Rust. The moment you have a function returning more than one value, your only option is to create a variable. It's weird.

Interestingly, you can do this, but I've never done it and never seen it in the wild:

  func foo(v int, err error) { ... }
  func bar() (int, error) { ... }
  func main() {
    foo(bar())
  }
I figure the main feature of goroutines is that it is considered acceptable, for whatever reason, that a library function spawns helper goroutines without telling you (as long as it reigns them in somehow and they don't leak or anything weird like that). If eg. a random C or rust library spawned threads for random tasks without explicitly being a concurrency thing, it would probably raise a lot more eyebrows, no?

In the end this is probably because of the possibly superstitious belief that goroutines are free whereas threads need to be carefully budgeted for, and maybe somewhere between premature optimizations and designing for very niche scalability requirement, but subjectively the result is still that goroutines are "available" in a lot more situations than boring, pedestrian OS-level threads.

I feel like that counts as a semantic difference, even if might be social more than technical.

An implementation with buy-in across the entire ecosystem and language so that you don’t have some systems using threads and other systems using futures and other systems using different reactors, etc.

Also known as the point of the article.

Additionally, that implication is entirely of your own creation. The article explicitly lists many languages besides Go:

> Three more languages that don’t have this problem: Go, Lua, and Ruby.

Perhaps you just have an anti-Go bias?

Last I tried it, if I had a million goroutines calling stat(), Go would attempt to spawn a million kernel threads. So I rolled my own rate limiter, bleh.

Is that still the case? (Happy if not)

If it is still the case, is there a standard solution or is it up to the app author?

It's up to the application author in that case, unfortunately. stat() enters the kernel and resolves in one shot, so it requires a whole thread. I haven't read into it very carefully, but on Linux, perhaps the new io_uring business is going to change this state of affairs. For now, however, you need a semaphore of your own.
I wish I understood what problems people solve day in and day out such that they need to call IO in the middle of Dijkstra’s algorithm. Most "business logic" ought be pure functions over persistent data structures with no IO other than the occasional logging. At the last possible moment, 5% of the code wires in IO in some boilerplaty way. Is that sufficiently pervasive to worry about it?
> what problems people solve that they need to call IO in the middle of Dijkstra’s algorithm.

I worked on several problems of this nature at Twitter in 2012. Hopefully there’s a better way to solve them in 2019...prolly not, but maybe.

say you want to find the median of the number of followers a person on twitter has. so that should be easy - make 1 dataframe with follower count of each bloke and call median() - well, there’s some 300,000,000 blokes, so not that easy :) You have to make a dataframe via ETL - reading & writing to disk 100s of times, loading a few thousand users each time, distributed median computation. so a silly sub-second median query took 2 months to code up & debug & ran for a few hours due to so much IO.

another much harder problem - you want to find the median number of hops between one user & another. so now you have 300m x 300m tuples as your result - where & how to store them is in itself a monstrous challenge. but how the heck do you even compute the result ? you read in one tweet from john to steve, so that’s 1 hop from john to steve & viceversa. you then read a second tweet from steve to mary, so that’s 1 hop from steve to mary & viceversa, 2 hops from john to mary & viceversa. in this manner you read 100s of billions of tweets & keep updating hopcount. somewhere in there john sends mary a tweet - oh fuck now the hopcount is 1, not 2. this will then change lots of other hopcounts. in theory there are nice graph algos for this sort of thing. but in reality, your data is billions of tweets constantly increasing, stored in distributed compute clusters across the planet & just getting a handle on all this can be a 6 month project for some lucky scientist who got to work on this.

> I worked on several problems of this nature at Twitter in 2012. Hopefully there’s a better way to solve them in 2019...prolly not, but maybe.

Okay, so Twitter has scale. To a first, second and third order engineering approximation--nobody else does.

If you are a mere mortal writing practically anything, pull it all into memory, operate on it to create another copy, destroy the original copy (or let GC kill it).

Embedded programmers might get a pass on this given limited memory (32K RAM)--but that same kind of attitude is getting more and more essential as you start getting Big/Little core mixes on the same chip.

Computers are mind-bogglingly powerful.

I have been completely stunned at how many transactions Nginix+Django+PostgreSQL can actually handle before you need to start thinking about "scaling".

> I wish I understood what problems people solve day in and day out such that they need to call IO in the middle of Dijkstra’s algorithm.

For Dijkstra, imagine a very large graph that cannot fit into memory, where you'll need to go out to disk or network to compute the distance of two nodes (or fetch part of the graph, etc).

The business logic many (perhaps most) programmers work on is primarily occupied with gluing together multiple state stores (databases, caches, message queues), running some very simple computations, and writing the output to some IO sink (often a web response).

Sure, you could extract the computations part out. But that barely moves the needle on testability/cleanliness, because most of the business (or business-value-driving) logic is the data flow management--highly stateful IO coordination.

Yes. So much business logic requires IO. Pure business logic is a great dream, but it's a dream.
I/O, both disk and network, can happen almost anywhere because data doesn't always fit in memory, may be streamed from a remote source, and I/O handling cannot always be deferred indefinitely. A significant part of systems software design is accounting for this reality.
One of the primary issues I run into this is that some parts of the domain logic determine what data you need from the database and this logic can't be easily moved into sql(or however you're specifying what data you need back from the datasource).
That link has a great punchline. "All of this is easy with threads which never cause any problems!" I haven't tried Go, but this seems certain to be an exaggeration.
> I find basically any other language or platform except Go lacking and unpleasant in this regard, due to the viral nature of asynchronous functions

That's...odd, because while Go lacks that issue, it was by no means unique among languages or frameworks in that regard when it was introduced, and isn't now, either.

This is the same reason why I love C, and why I think that, for projects larger than a 1000 lines, it allows me to be MORE productive than more sophisticated languages. Possibly even resulting in a smaller linecount, but at the very least less accidental complexity.

One small red/blue pain point in C is varargs - for most varargs functions I have to make a "..." and a "va_list" version.

When using a C library, who's code is responsible for allocation objects? Who's is responsible for freeing said objects? That alone leads to a LOT of issues.

Not to be evangelical, but give Rust a look.

Seriously, all these language mechanisms like RAII make it a lot harder to write modular code. Look at the mess that C++ got itself in, with its constructors, default constructors, move constructors, rvalue references, exceptions (required as a consequence of constructors), and what not. It's hard to believe Rust could make it significantly less painful.

Programming is mostly not about initialization and deinitialization. If it is, you're doing it wrong, you have too many small objects.

Yes, stack allocated STL containers can be nice for quick "scripting". But I will happily write a few function local deinitialisations to enjoy much less convoluted and interdependent, slowly compiling code.

Rust makes it less painful by not having constructors, default constructors, move constructors, rvalue references, or exceptions.