Hacker News new | ask | show | jobs
by Thorrez 1418 days ago
This doesn't match what I see in the gRPC spec. It says every message must be length-prefixed.

https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2....

Disclaimer: I don't know much about gRPC.

1 comments

I've spent a fair bit of time working with gRPC, and you're correct - gRPC's length-prefixing makes it easy to detect when individual messages are terminated early. You do still need some way to detect streams that terminate unexpectedly on message boundaries - perhaps you could rely on HTTP/2 EOS bits as evidence of an application-level success, but you need some equivalent of trailers to communicate the details of any errors that occur midway through the response stream anyways.
First, we need to clarify. The problem is that you cannot use GRPC from javascript (yes, there's unofficial, and sketchily supported, hacks, but read on for why they're required)

He explains the problem that caused this, in his opinion, in the article, but not very obviously. The problem is protobuf encoding. It's key-length-value (key identifies the field that follows, length is the length of the value, value is the value). A message is thus "KLVKLVKLVKLVKLV".

The second thing is that repeating a key is always valid (not just when it's declared as an array of values). If a field is repeated but it's not an array, then the newer value overrides the previous one.

(this was done so that if you have 20, or 2000000, massive protobuf files listing, say, urls visitors used, and you need to combine them, you can just concatenate the bytes together and read in the result as a valid protobuf. Also it means you do streaming reads of any protobufs coming in)

So:

1) you don't know when the message ends because you don't have a length field for the entire message (only for individual fields)

2) you don't know the message ends after a given number of fields, for 2 reasons. A, some fields are optional, and may or may not be present. B, even if all fields were sent, a newer version of an already-sent field might be sent to override the previously sent value.

You're right, of course, that the problem can be fixed by only allowing a single top-level message that is decoded in a nonstandard way (and frankly support for this could be added to the official protobuf libraries and it can be made to work by making this requirement optional ... Even concatenating can still be possible that way)

The problem here is rigid, immovable adherence to their own standard (ie. not incorporating a change like demanding KLV on the top level message in an RPC connection, or having a special concat-compatible, one-field-at-a-time decoding, because an architectural decision made 15 years ago said not to do this. Then make this mode required for the HTTP case)

This was an organisational problem. Not that they don't trust each other (obviously a browser developer doesn't trust many people, security really demands they don't. This is not where the problem is). GRPC failed to consider the fallout of them choosing the nuclear option of just dropping out of browsers in order to satisfy an old internal requirement without modifying their design. They chose this outcome, because they couldn't deal with achieving 99.9% of their aims as opposed to 100% ... and missed one of their main aims, let's say they achieved 50% instead.

I understand the points you're raising; I'm saying that gRPC enveloping solves them. Once messages are enveloped, you _do_ have a length field for the entire message.

The article is, IMO, somewhat misleading - it discusses the issue without mentioning gRPC envelopes at all, and it seems pretty clear that envelopes were designed (in part) to address this exact issue.

Another option is simply to reserve a key (or multiple) for providing stream/transport specific metadata which should be stripped out before handoff to the client, such as allowing you to send an "end" marker. Now you're not depending on the transport layer cooperating. It's not a particularly hard problem.
That has the downside that you're now limiting what protobuf payloads you can send. You need to have the inner protocol (protobuf) cooperate with the outer protocol. That also makes it difficult to switch to a different type of payload besides protobuf, since even if you could convince the protobuf standard to reserve a specific key for the outer protocol, you might not be able to convince the standard for a different protocol to reserve that for you. The article says "gRPC is definitely not Protobuf specific".

It's like an intrusive linked list vs std::list. Sure you can do intrusive linked lists, but it means you have to clutter up your object with info about how it's stored. It mixes the layers.

I think most people would say that not being able to use GRPC at all on the web is not exactly the superior option/outcome ... I'm not sure why a "pure design" (non-mixed layers) matters when the result is non-functional.

Just open a second websocket for the second protocol. Or use WebRTC. Or ... most protocols have these channels, which means mixing payloads is not really that useful. It doesn't buy you anything over the situation where you're not mixing protocols.

I'm not saying the current situation is better than your suggestion. I'm saying there are better ways to fix the current situation than your suggestion.

My idea of a better way would be:

* If you just need a boolean of success vs failure: use END_STREAM vs RST_STREAM.

* If you additionally need metadata of failure reason: use the existing length prefixes that gRPC has, and additionally add a bit to indicate data vs trailer. Then implement your own trailers inside the HTTP2 stream to indicate success vs failure and failure reason. Sure these trailers won't get HTTP2 compression like native HTTP2 trailers, but that shouldn't be a big problem.

Using 2 websockets would be confusing because things could arrive out of order from how they were sent. And one websocket could fail but not the other, leading to a confusing mixed state. The whole reason for trailers was to make failures less confusing by having error messages in them.

Also, using websockets goes against the whole gRPC design idea. They wanted native HTTP2. We don't need websockets to fix the problem, we just need to implement trailers inside the stream instead of using native HTTP2 trailers. Implementing trailers inside the stream can be done with native HTTP2 streams or with websockets inside HTTP2 streams. It's a smaller change from the current protocol to put the trailers inside the native HTTP2 stream than to add websockets to the mix then implement trailers inside that.