Hacker News new | ask | show | jobs
by djha-skin 1349 days ago
> If we drop those markers (1110 and 10 in front of bytes) and keep the remaining bits we're left with 1111111011111111, which evaluates to 65279, which is in hexadecimal 0xfeff. Yes, you recognize it, it's a BOM. Because yes a BOM is just a ZERO WIDTH NO-BREAK SPACE, isn't it beautiful?

Byte Order Marks have stolen hours and days of my life. Anyone suffering the pain of developing on a windows box can relate. Windows puts BOMs by default in the front of every file. Thus windows programs silently ignore it, but then linux machines run the program and choke on the BOM. You have to specifically ask the editor if the BOM is even there, it doesn't show up in the editor by default. I have specific lines in my .vimrc[1] that prevent BOMs from ruining my day/week, but they still pop up often. I often joke there will be a byte order mark on my tombstone, along with avahi daemon.

1: https://git.sr.ht/~djha-skin/dotfiles/tree/main/item/dot-con...

5 comments

> Byte Order Marks have stolen hours and days of my life.

Me too, to some degree. I have discovered them in a Ruby code base at work, in the middle of a line of code (copy pasted), where the Ruby interpreter thinks they are undeclared identifiers. When the code runs, it throws an exception every time that complains of “Undeclared identifier `‘”.

The dad-joke of it is that “You gotta sweep for BOMs before they blow up your code.”

See my other comment in this thread about my project:

https://github.com/jftuga/chars

I have written a cross platform, stand alone CLI program to inspect a file for BOM8 and BOM16. It also detects if a file uses CRLF or LF. Tab and nul characters are also evaluated. Please see the Examples in my repo:

https://github.com/jftuga/chars

Love it!
I've dealt with two elusive bugs which were ultimately caused by Windows stupidly using UTF-8 with BOM by default. Python requires you to take extra steps to decode that garbage, and some C++ libraries can't handle it at all.

I'm sure there were good reasons that BOM sounded like the right idea at Microsoft, but everyone else just used straight UTF-8 and it was fine.

Windows supported Unicode in 1993 (NT 3.1) and 1995 (Win95) via UCS-2, a fixed-width 16-bit encoding.

In 1996, it was realized 16-bit wasn't enough, and was expanded in Unicode 2.0, which also included UTF-16, a variable-width encoding, which required the BOM.

Windows 2000 supported UTF-16 on release.

Why didn't Windows 2000 support UTF-8, which was invented in 1992 and implemented in Plan9 in that same year? Who can say...

> along with avahi daemon

Tell us more!

Back in 2014 I was trying to set up a Linux machine and bind it to the active directory domain at work. The active directory domain was a .local domain, but avahi Daemon thinks any packet that's bound for a DOT local address is addressed to it. So it would swallow up all the packets that were headed to the domain controller, look at them, think they were weird and not understand them and then drop them on the floor. From my perspective it looked like the firewall just hated me.

It was like a week or two later until I finally went to my friend and said I must be stupid but I can't do this it's not working and he just disabled the avahi daemon and everything started working again.

Blarg.

Oof. On the other hand...

> The active directory domain was a .local domain

.local is a reserved domain for mDNS (aka ZeroConf or Bonjour, the stuff Avahi handles), standardized in early 2013.

Then again, 2014 is soon enough after for that for knowledge not to have percolated everywhere, and/or for it to stomp on older networks that had used .local beforehand.

Microsoft recommended using .local for active directory domains since the 1990s, I think because back then it was not reasonable to demand that their customers register a domain name at a time when that was a massive hassle. But it was still wrong to squat on a TLD: there were already moves to expand the number of TLDs at the time, but MS were very slow to correct their mistake.

Then Apple made the same mistake with Bonjour / mDNS, and the IETF standardized Apple’s use of .local and it all became an even worse mess.

I've seen consultants set up ActiveDirectory as "companyname.ad", as if Andorra didn't exist.
> nobomb

Sounds like that is a good choice for the option name