We're using grpc-java in production for some of our based backend system, slowly replacing our old netty/jackson based system using JSON over HTTP/1.1.
The performance is good, and it's nice to have proto files with messages and services, which acts both as documentation and a way to generate client and server code. Protobuf is much faster, produces less garbage and is easier to work with than JSON/jackson. The generated stubs are very good and it's easy to switch between blocking and asynchronous requests, which still only require a single tcp/ip connection.
We've had two performance problems with it:
1. Connections can die in a somewhat unexpected way. This turned out to be caused by HTTP/2.0 which only allows 1 billion streams over a single connection. Maybe not a common issue, but it hurt us because we had a few processes reaching this limit at the same time, breaking our redundancy. It's easy work around it, and I believe the grpc-java team has plan for a fix that would make this invisible to a single channel.
2. Mixing small/low-latency requests with large/slow requests caused very unstable latency for the low-latency requests. Our current work-around is to start two grpc servers (still within the same java process and sharing the same resources). The difference is huge with 99p going from 22ms to 2.4ms just by using two different ports. Our old code with JSON over HTTP/1.1 implemented using jackson and netty didn't suffer this unstability in latency, so I suspect grpc is doing too much work inside a netty worker or something. I haven't yet tested with grpc-java 1.0, which I see has gotten a few optimization.
Still, these have been minor issues, and we're happy so far. The grpc-java team is doing a good job taking care of things, both with code and communication.
> This turned out to be caused by HTTP/2.0 which only allows 1 billion streams over a single connection.
Hilarious. People called this issue out as an obvious flaw when HTTP/2.0 was first proposed, got ignored, and here the issue is.
For those unfamiliar:
HTTP/2.0 uses an unsigned 31-bit integer to identity individual streams over a connection.
Server-initiated streams must use even identifiers.
Client-initiated streams must use odd identifiers.
Identifiers are not reclaimed once a stream is closed. Once you've initiated (2^31)/2 streams, you've exhausted the identifier pool and there's nothing you can do other than close the connection.
For comparison, SSH channels use a 32-bit arbitrary channel identifier, specified by the initiating party, creating an identifier tuple of (peer, channel). Channel identifiers can be re-used after an existing channel with that identifier is closed.
As a result, SSH doesn't have this problem, or the need to divide the identifier space into even/odd (server/client) channel space.
Well, that's half of the downside presented, the other half is that it's split be server/client connections. I assume this was done because it simplifies the tracking of the next stream identifier, because you can just keep a counter and increment, rather than a table of used streams to check a new random identifier against?
So, that's the other side of the argument? I assume there was at least a reason they specced it this way originally, even if under comparison those reasons wouldn't have held up. Was there any justification, or was it literally ignored?
We use it heavily on multiple projects across languages, and for the most part it works very well. We've had some pain about sharing proto definitions across languages and keeping them in sync. It's probably a much smaller problem when you've got a company-wide monorepo like Google, but you'll definitely have to be vigilant about your build processes to make sure you have the latest definitions shared.
Some of the language bindings (Ruby) started off feeling experimental quality when we began the project, but overall it's been a huge win for us versus HTTP+JSON. I'm sure a non-zero portion of the benefit has been using protobufs at all, but gRPC gives us a great way to generate clients for every language we use without worrying.
Could you expound upon the problem of keeping your protocol definitions in sync? In my experience this is the strength of protocol buffers: if you follow a few rules, your systems can successfully be decoupled. Some of the rules are never re-using a tag number and never changing a type in an incompatible way (e.g. string->bytes might be ok, but int32->bytes is not).
Yeah! I think a few responses have covered this below, but I'll give you our spin (and why it's painful, compared to what people have offered up).
Most of the projects that we're integrating gRPC into are existing codebases that have their own build tools that are (mostly) in isolation. JSON schemas have been agreed on beforehand, and there are separate client implementations in different languages that basically exists as independent units.
By adding protobufs to this process, the "JSON schemas that have been agreed on" become protobuf definitions - which is _fantastic_ for development teams, because they have a single spec to work from, and there is no ambiguity (or, significantly less).
The challenge comes when we are trying to generate gRPC clients in Go, Ruby and Python for the same profobuf file - in order to do it in an automated fashion without a 'monorepo', we need to create a build system that pulls this protobuf from a central place and generates the client, which doesn't exist right now.
It's not a huge challenge to ensure services can communicate at all - as you said, protobufs have thought of this and have an extensive amount of decoupling built in. When we're working on adding new features however, we need to have a place to keep the "gold master" of protobufs, and grab it for all of our projects to build+deploy at once, which is where the above becomes challenging.
Not an un-solvable problem, and different languages have different tooling for this. We've settled on placing the proto definitions in the "server" side (most of our interactions are fairly well modeled by client/server), and then updating the clients as-necessary, as we can deploy server changes without needing to update the clients immediately.
There's no need to pull the proto file on every build. Proto also has a set of rules for how to maintain wire-compatibility across versions[1]. Following those rules and distributing the definition only when you need new fields should be sufficient.
That said; If you've got a set of shared proto definitions, you should probably either go to a monorepo, or share the shared bits with a git submodule. Doesn't prevent you from needing to follow those conventions, but does make it far easier to debug when things changed.
That problem can be avoided without monorepos. You primarily need a way to declare a dependency from one package on another at build time, such that the appropriate release gets pulled in. For example, maybe you've depended on version 2 of the interface definition; and in that case the build system fetches the artifacts for interface 2 at build time when building the client. Maven for Java works this way.
Ideally this system would also allow the package owner to release updates within an existing version if they wish. For example, backward-compatible changes to the service interface can be released while keeping the major version 2. In this way, clients automatically consume safe updates, while incompatible or risky changes can be given a major version bump (e.g. to 3). Consumers who want to pin the interface to a specific version like 2.5.1 could do so, in some build systems, though dependencies this specific are rarely useful or a good idea. In my experience it's best for the contract between producer and consumer to be explicitly versioned at the "major version" level, and only implicitly versioned (meaning updates are automatic) at the minor version level.
That seems like a "doctor it hurts when ... " scenario, and I don't see why it's specific to protobuf. Any IDL managed that way would have the same problem.
I'm not using it outside of Google but I will start for some personal projects with this announcement. I can say that it is one of the best parts of our tech stack and one of the great things about building systems here.
I've used it. It's easy to use and very capable. My favourite feature is that it supports streaming objects, in both directions. In other words you can do an RPC call where the input and/or output is an asynchronous stream of objects.
Every RPC system needs this, or you end up with hacks like HTTP long polling.
My least favourite feature is that it is tied to HTTP2. I'm not sure what you're supposed to do if you are running on a microcontroller.
Agree on this point. The most innovative feature is streaming, which enables some very powerful scenarios.
The tying to HTTP/2 and especially the way it is done is also not my cup of tea. E.g. if it wouldn't have chosen to use HTTP trailers (which are mostly unsupported) it could be implemented with a lot more HTTP libraries. It's also sad that it doesn't run in current browsers because of the lack of trailer support as well as streaming responses there. With putting a little bit more thoughts in it (maybe choosing multiple content types/body encodings) this could have been supported - at least for normal request/response communication without streaming.
Regarding microcontrollers:
It should be possible to implement HTTP/2 and gRPC also on microcontrollers, but imho it will neither be easy nor necessarily a good choice. Implementing HTTP/2 with multiplexing will need quite a lot of RAM on a constrained device, especially with the default values for flow control windows and header compression. You can lower these through SETTINGS frames, but that might kill interoperability with HTTP/2 libraries that don't expect remotes to lower the settings or to reset connections while settings have not been fully negotiated.
So gRPC evolved out of Stubby. An excellent show of force would be to announce that Stubby has been internally replaced by gRPC, so that the "gRPC is internet scale" assertion can be more than just a gimmick. Knowing nothing of the first one and very little of the second I imagine it would be some important task, so I have to ask: do you plan to internally run with the stuff you open-sourced ? What is missing ?
Been a couple years since I worked at Google, but when I was there, Stubby was pretty intimately connected with Google's networking fabric, datacenter hardware, and internal security & auditing needs. None of this is at all useful to external customers - you're not running on Google's proprietary hardware, you don't interface with their monitoring & auditing systems, etc.
As an ex-Googler, using gRPC feels just like using Stubby: the interfaces are the same, the serialization code is the same, the only thing different is the networking code and transparent hooks into other systems.
gRPC faces a longer road to feature parity with Stubby. For external adopters this is not an issue, so it makes sense that it would be available to the public in advance of its adoption inside Google.
not to mention the internal infra grows all these knobbly bits as one-off feature requests for large/influential teams, that aren't necessarily useful outside the goog
I'm glad they are releasing version 1.0 but I feel that the maintainers of the Go gRPC team have a lot of work to rebuild trust.
I've seen backwards incompatible changes made by core go team members and core gRPC maintainers. Where the API is statically consistent but actually behaves in completely different broken ways. One of these was big enough that I said screw it and am moving that application away from gRPC.
I've seen multiple issues where the library you generate against ends up being incompatible with the library you link against at build time. They finally added a version check as part of the build/run step to prevent this from causing silent runtime errors.
Maybe in a year gRPC will actually be stable, maybe it has been over the last three months. I don't really know but I gave up, am moving my applications off of it and actively pushing for coworkers to do the same.
"Rebuild trust"? This is the first stable release. Every single release before than was called a "Development Release" (and Protobuf 3.0 only came out of beta less than a month ago), so of course they've been breaking things before now.
The performance is good, and it's nice to have proto files with messages and services, which acts both as documentation and a way to generate client and server code. Protobuf is much faster, produces less garbage and is easier to work with than JSON/jackson. The generated stubs are very good and it's easy to switch between blocking and asynchronous requests, which still only require a single tcp/ip connection.
We've had two performance problems with it:
1. Connections can die in a somewhat unexpected way. This turned out to be caused by HTTP/2.0 which only allows 1 billion streams over a single connection. Maybe not a common issue, but it hurt us because we had a few processes reaching this limit at the same time, breaking our redundancy. It's easy work around it, and I believe the grpc-java team has plan for a fix that would make this invisible to a single channel.
2. Mixing small/low-latency requests with large/slow requests caused very unstable latency for the low-latency requests. Our current work-around is to start two grpc servers (still within the same java process and sharing the same resources). The difference is huge with 99p going from 22ms to 2.4ms just by using two different ports. Our old code with JSON over HTTP/1.1 implemented using jackson and netty didn't suffer this unstability in latency, so I suspect grpc is doing too much work inside a netty worker or something. I haven't yet tested with grpc-java 1.0, which I see has gotten a few optimization.
Still, these have been minor issues, and we're happy so far. The grpc-java team is doing a good job taking care of things, both with code and communication.