Hacker News new | ask | show | jobs
by sapek 4174 days ago
1) I didn't do a good job explaining this. You are right that if you want to materialize an object during deserialization you need to know a schema at build time to generate your class. But the crucial things is, and this is true of all similar frameworks, you don't know the schema of the payload at that point. One big reason you use something Protobuf, Thrift or Bond is to get forward/backward compatibility. What this means in essence is that deserialization is always mapping between schema you built your code with and schema of the payload. There are two common ways to do that mapping: (a) payload has interleaved schema information within data and you perform branches at runtime based on what you find in the payload (this is what Protobuf, Thrift and Bond tagged protocols do) (b) you get schema of payload at runtime and use that information perform the mapping (this is what Avro and Bond untagged protocol do). The latter case is particularly suitable for storage scenarios: you read schema from file/stream header and then process many records that have that schema. This is the case where having ability to emit code at runtime results in a huge performance win: you JIT schema-specific deserializer once and amortize this over many records.

2) You can do both. You can also do type safe transformations/aggregations/etc on serialized data w/o materializing any object.