| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jiggawatts 1041 days ago

Security is a concern for every layer. It's not magic pixie dust that' can be sprinkled on top of software to renders it secure!

A while ago I read a great article about how the Adobe PDF serialization format is nearly impossible to secure because it allows inherently unsafe constructs.

For example, it allows cross-references that are basically just arbitrary unaligned pointers. It uses many different alignment and padding algorithms. It has length-prefixed and not-length prefixed sections. Etc, etc...

Apparently it was a serious research exercise to make a safe PDF parser, and they only covered a fraction of the full spec!

To put things in perspective: Originally, PDF allowed arbitrary code execution as a core feature, allowing the output of shell commands to be used as document content.

Most people like the Chromium and Firefox teams have just given up and now parse PDF using a sandboxed JavaScript VM because it's too hard to do it safely with C++. They parse HTML and JavaScript with C++, but not PDF. Think about that.

A similar issue caused Log4j, where a "format string parser" contained a vulnerability because it was too flexible and allowed network requests to be triggered by user-controlled data.

Even trivial, "surely it must be safe" formats like XML and JSON are riddled with security issues, such as different layers in a microservice architecture having different handling semantics for duplicate keys, null values, etc... This can result in exploits such as authentication and authorization tokens being interpreted by a system one way, but a different way by a different system. For real-world attacks along these lines, search for "request smuggling".

Serialization and parsing are security minefields and it is dangerously naive to just hand-wave that away.

See: https://seriot.ch/projects/parsing_json.html

1 comments

signa11 1041 days ago

> Serialization and parsing are security minefields and it is dangerously naive to just hand-wave that away. well, i am not hand-waving them away, i am not sure what can the serialization framework possibly _do_ to make things secure during the serialization ?

when execution of user-supplied code is allowed (in the examples that you have outlined above), surely, the layer _executing_ the code cannot really do anything about it ! perhaps you actually did intend to `rm -rf /` ?

policy checking, enforcement etc. has to happen at a higher / different layer. i am not sure why mechanism and policy are being conflated here.

in the same way, you gave the serialization layer a 10mb or whatever sized input to serialize, sure...you get an valid serialized output etc. maybe there is a genuine usecase for that in some context or another f.e. when serializing say image files, or something else etc. etc.

[edit] : minor comment.

link

jiggawatts 1040 days ago

> I am not sure what can the serialization framework possibly _do_ to make things secure during the serialization

Loads of things!

A strict specification that can only be interpreted one way goes very far. E.g.: a machine-readable BNF grammar file or something similar with no ambiguities.

A conformance test suite covering corner-cases is surprisingly effective, even with a supposedly perfect spec.

"Be strict with what you generate and lax with what you accept" has been demonstrated over and over again to be a disaster over the long-term in an ecosystem of many groups. Be strict always with what is accepted, not just generated!

Speaking of being strict: schema validation is essential. Strong typing for scalars helps a lot.

The actual implementations of the spec can obviously have a wide range of security features. Never allowing arbitrary type instantiation is critical, yet is a mistake that keeps reoccurring much like SQL injection.

Etc, etc...

link

signa11 1039 days ago

> I am not sure what can the serialization framework possibly _do_ to make things secure during the serialization

>> Loads of things!

>> A strict specification that can only be interpreted one way goes very far. E.g.: a machine-readable BNF grammar file or something similar with no ambiguities.

once again, that is not the domain of the serialization framework ! it is a policy which needs to be established and enforced at input / output layer by the entity which implements it.

a serialization framework should just serialize and deserialize objects to / from an i/o 'channel' f.e. file, network, etc. shackling it with specification / enforcement of security etc. policies seems conflating one concern with another.

link

eviks 1040 days ago

what's the best modern alternative that is designed in this way?

link

jiggawatts 1040 days ago

gRPC ticks most of the checkboxes.

Unfortunately these are lessons that have to be learned over and over. Anything based on JSON is generally suspect. If you see the terms "quick" or "simple" in some marketing splash-page, assume the author has not thought about the hard problems like security and long-term interoperability.

Similarly, if you find yourself hand-rolling RPC client code and calling methods on something like "HttpClient" manually, you've done it wrong. That code should have been spat out by a code-generator from a schema.

link

signa11 1039 days ago

> gRPC ticks most of the checkboxes.

huh :) ! gRPC is a 'r-p-c' framework, and uses protobuf for serialization. you should be comparing protobuf to cap'nproto.

link