Hacker News new | ask | show | jobs
by pabs3 1348 days ago
Is there a tool to check for byte order marks, zero width spaces and other "weird" Unicode characters?
1 comments

I wrote a cross platform, stand alone cli program to do this.

It determines the end-of-line format, tabs, bom, and nul characters:

https://github.com/jftuga/chars

Nice. Would it be possible to have an option to only output the names of files that failed the -f check? i.e. hide the names of files that look "normal" and show the "weird" ones.

Also, does it detect files that only contain CR as EOL characters? Or files that have different EOL characters on different lines?

I like the idea of only showing filenames that fail using -f so I created an issue for that. According to:

https://en.wikipedia.org/wiki/Newline#Representation

CR does not appear to really be used as EOL. Also, I don't think having different EOL chars within the same file is really a thing.

Thanks for the issue, I've subscribed to it.

https://github.com/jftuga/chars/issues/2

According to the page, several machines and operating systems used CR as EOL. While the systems are all obsolete, files from that era that use CR as EOL could persist and be transferred to modern systems. Clearly those are weird on modern systems, so they should be warned about in a linting situation, which I would like to use your project 'chars' in.

Having different EOL chars within the same file is definitely a thing, usually by mistake. I had to fix a bug about this recently:

https://github.com/EionRobb/purple-discord/pull/416

I added a new command line switch, -F

* when used with -f, only display a list of failed files, one per line

https://github.com/jftuga/chars/releases/tag/v2.3.0