I believe this little piece answers your question:
> We likely could have addressed this behavior in Image Proxy, but we had been experimenting with using more Go, and it seemed like a good place to try Go out.
At the heart of if, they were looking for opportunities to use more Go in their stack and they deemed this situation as a fit.
3. More employees knowledgeable about Go than Python
4. More enthusiasm (and therefore faster velocity) around Go development.
The blog post was about the engineering challenges they faced and how they solved them and I think it was a great write-up in that regard. The post wasn't about why they switched this service from Python to Go.
It might be, then again I see a lot of wheel reinvention in tech / NIH syndrome.
I'm the kind of hacker who if a service runs out of memory every 2 hours, writes a crontab to restart it every hour after X random minutes so they don't all restart at the same time. It gets a lot of eye rolls from the other engineers searching for perfection, but it tends to produce services quickly that are highly reliable.
And look now the engineers who like chaos monkey don't even have to set that up. It's built in.
Part of it is just Discord’s operating scale. They are already leveraging Elixir clustering to an extremely high rate of concurrency and when you start thinking about problems from that standpoint Go becomes a much more natural fit within the stack for low level micro services.
I agree that tech in general and Silicon Valley in particular has a lot of NIH, but I also think this isn't really the case here. In particular, we're discussing a Python service that performs slow image resize calls. They would have (probably, speculation on my part/experience) had to do 2 things:
1. Add profiling and telemetry to their Python code. Refactor the codebase based on insights from this.
2. Write a C<->Python interop for their image libraries.
I can't see the cost of #2 being any different than the cost they paid on writing it in Go. As for #1, depending on how the code is structured, a rewrite may have been less time than profiling spaghetti code. At that point, it depends on how much Go experience the team has.
Yeah, either a good Python JIT or Cython would have been fine honestly. I never understood the obsession with "python is slow" when you can recover almost all of the performance with a good JIT or Cython (in many/most cases).
Yes. Or simply profiling the app and optimizing sore spots would have helped too. It seems to me there was no real reason to move from Python to Go, apart from preference.
I don't think the article gives us the data to know this. Where did the latency spikes in the original implementation come from? Would fixing them have required a complete rewrite of the Python parts anyways?
I understand this is a personal preference, but having spent a good amount time with both Python and Go, FWIW I would also choose Go if I were solving the same problem.
From reading this, seems HTTP handling speed was important to them? which Go is probably better for. Also, interfacing Python to C/C++ is pretty unpleasant.
> We likely could have addressed this behavior in Image Proxy, but we had been experimenting with using more Go, and it seemed like a good place to try Go out.
At the heart of if, they were looking for opportunities to use more Go in their stack and they deemed this situation as a fit.