Hacker News new | ask | show | jobs
by lelanthran 61 days ago
I can't tell what the argument is just from the slideshow. The main point appears to be that code pages, UTF-16, etc are all "plain text" but not really.

If that really was the argument, then it is, in 2026, obsolete; utf-8 is everywhere.

2 comments

He has a YouTube channel, there's a talk on there.

He also discusses code pages etc.

I don't think the thesis is wrong. Eg when I think plain text I think ASCII, so we're already disagreeing about what 'plain text' is. His point isn't that we don't have a standard, it's that we've had multiple standards over what we think is the most basic of formats, with lots of hidden complications.

Tell that to programmers writing code to extract data from PCL print streams by stripping escape sequences and processing the result as "plain text" (in multiple incompatible extended ASCII encodings specified by the stripped escape sequences), or anyone exporting data from Excel in "CSV (Comma delimited) (*.csv)" format.

UTF-8 is everywhere. Until it's not. And it's impossible to distinguish UTF-8 from any other extended ASCII encoding given a sample containing only ASCII characters, so there's still no reliable way to process data that can only be described as "plain text".