Hacker News new | ask | show | jobs
by tgbugs 558 days ago
I made a design decision for a standard for dataset structure to explicitly ban characters beyond ascii [A-Za-z0-9.,-_ ] precisely because all the positivity around utf-8 often leads people to think that it comes with no additional complexity cost. There is an escape hatch with a way to indicate that a dataset uses unicode filenames but the standard states that any consumer may reject such datasets because unicode support is explicitly not required.

I got pushback from people who would not have to implement or maintain the systems for being a backward asciite so seeing this article is rather vindicating.