Hacker News new | ask | show | jobs
by alangpierce 2347 days ago
Got it. So I understand, maybe someone saved a filename as the latin-1 encoding of some non-ASCII text, and Mercurial would need to support such files (but also would have no contextual information that it's latin-1)?

I'm tempted to say "nobody should have filenames like that", but I guess a project like Mercurial needs to be as compatible as possible. Are there modern use cases for filenames like that, or is it fair to say it's all legacy data?

2 comments

It's going to be the case every time you mount a Windows file system, for example.

A big part of the problem is that a project like Mercurial doesn't have control over what files people use it on. They have to design for the pessimal scenario, because when the tool breaks, users complain.

If you want to write a version control system, banning a big chunk of perfectly legal filenames on both linux and windows seems like a bad choice. Users do have such filenames, and saying you can't store their files "because they aren't UTF-8" will annoy them.

I've seen such filenames occur from people using the name as a binary encoding in some way. As long as you miss (from wikipedia) NUL, \, /, :, *, ", <, >, | you will end up with a filename which all OSes support, and some systems do that.