| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by panic 3276 days ago
	If you use bytes that are invalid in UTF-8 (e.g. 0xF5-0xFF) as delimiters / structure characters, you can use text-only tools to do structured operations on UTF-8 strings without ever having to escape anything -- the structural "characters" you would need to escape can never appear in the encoded bytes.

2 comments

> If you use bytes that are invalid in UTF-8 (e.g. 0xF5-0xFF) as delimiters / structure characters

No need for that. UTF-8 is a superset of ASCII, and ASCII already includes a handful of control codes that could be used here.

The Unix shell operates on binary streams, not on text.

Ah, I thought this was about structured text (like JSON) vs. plain text -- binary content is a bigger issue!

If you want to replace the shell with a structured one, the change needs to be pervasive, down to replacing the typical file formats, imo.