| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pabs3 1395 days ago
	Is there a tool to check for byte order marks, zero width spaces and other "weird" Unicode characters?

1 comments

jftuga 1395 days ago

I wrote a cross platform, stand alone cli program to do this.

It determines the end-of-line format, tabs, bom, and nul characters:

https://github.com/jftuga/chars

link

pabs3 1395 days ago

Nice. Would it be possible to have an option to only output the names of files that failed the -f check? i.e. hide the names of files that look "normal" and show the "weird" ones.

Also, does it detect files that only contain CR as EOL characters? Or files that have different EOL characters on different lines?

link

jftuga 1395 days ago

I like the idea of only showing filenames that fail using -f so I created an issue for that. According to:

https://en.wikipedia.org/wiki/Newline#Representation

CR does not appear to really be used as EOL. Also, I don't think having different EOL chars within the same file is really a thing.

link

pabs3 1394 days ago

Thanks for the issue, I've subscribed to it.

https://github.com/jftuga/chars/issues/2

According to the page, several machines and operating systems used CR as EOL. While the systems are all obsolete, files from that era that use CR as EOL could persist and be transferred to modern systems. Clearly those are weird on modern systems, so they should be warned about in a linting situation, which I would like to use your project 'chars' in.

Having different EOL chars within the same file is definitely a thing, usually by mistake. I had to fix a bug about this recently:

https://github.com/EionRobb/purple-discord/pull/416

link

jftuga 1393 days ago

I added a new command line switch, -F

* when used with -f, only display a list of failed files, one per line

https://github.com/jftuga/chars/releases/tag/v2.3.0

link