| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simiones 1323 days ago

We're stuck discussing JSON and other data transfer encodings, which is partly my fault as I brought it up, but there are far more scenarios for using combined text encodings.

It is very common to have templating languages which include their own syntax + the syntax of a target output language (e.g. Markdown supports HTML snippets that should get output to the final HTML as is; C macros support C code snippets, and C itself supports Assembler snippets that should end up in the final binary etc). When generating/processing the mixed format from your own code, you may often hit the problems above.

Even for JSON, there are legitimate reasons for processing stored JSON documents as text, or at least situations where it seems a safe enough approach - because people tend to forget that a string representation of a JSON document that has user-controlled input should be itself considered untrusted user input in its entirety, at least unless it is parsed by a JSON parser.

Additionally, data often has to be stored to unstructured storage (e.g. disk) between the moment you receive untrusted user input and the moment you output the final format to the user - again, doing the easy thing of storing in the intermediate format with the first level of escaping of untrusted input is extremely tempting, and the alternative is significantly more difficult.

1 comments

jiggawatts 1322 days ago

All of the use-cases you listed I would flag in a code-review as fundamentally misguided.

If you have formats "A" and "B" with serialization functions A() and B() that take document object models as inputs (not strings!), then nesting them is valid, but a bit of a code smell.

What you're saying is that there are scenarios where A() and B() take strings and return strings, and those strings can have control codes that "mean something" for A and/or B.

That's inherently bad and dangerous, and was the direct cause of one of the WORST vulnerabilities in history. Literally as bad as anything ever out there.

You're saying "maximum bad" is a good idea sometimes. This is like making the argument that a little nuclear war is acceptable on occasion.

> there are legitimate reasons for processing stored JSON documents as text

No, there isn't. Stop. Never do this. Ever.

Don't parse HTML or XML with Regex either. It leads to m̷͉̈a̴̳̚d̶̟̐n̴̩̓e̷̘̿s̴̤͆s̵͉͗: https://stackoverflow.com/questions/1732348/regex-match-open...