Hacker News new | ask | show | jobs
by jakegmaths 16 days ago
Windows really needs to ditch CRLF and just use LF, and switch from backslashes to forward slashes. Or better yet, just switch everything to full POSIX.

In powershell everything is much better than cmd, but it's just not enough.

WSL is generally great, but there are annoying downsides. I often get "catastrophic" crashes and the zone identifier files drive me nuts. Plus it takes so much longer to start VSCode when connecting with WSL, and now you've got two file systems. WSL1 was in many ways better than WSL2 for these reasons.

16 comments

Windows is also a rare bird in UTF-16.

"UTF-16 is used by the Windows API, and by many programming environments such as Java and Qt. The variable-length character of UTF-16, combined with the fact that most characters are not variable-length (so variable length is rarely tested), has led to many bugs in software, including in Windows itself.

"UTF-16 is the only encoding (still) allowed on the web that is incompatible with 8-bit ASCII. It has never gained popularity on the web, where it is declared by under 0.004% of public web pages (and even then, the web pages are most likely also using UTF-8). UTF-8, by comparison, gained dominance years ago and accounted for 99% of all web pages by 2025."

https://en.wikipedia.org/wiki/UTF-16

NT shipped with USC-2 as UTF-8 (and -16) did not yet exist. USC-2 naturally translated to UTF-16, hence the choice. NT/Win32 is also designed for fixed-with code units, something UTF-8 doesn't support.

You can use UTF-8 on a per-application basis, within limits.

https://learn.microsoft.com/en-us/windows/apps/design/global...

Conversely, UEFI is UTF-16 only, thanks to Windows.

UTF-8 only would be an ABI breaking change, so that's not going to happen. We don't want the NT kernel to end up like Linux, after all :-)

> UTF-8 only would be an ABI breaking change, so that's not going to happen. We don't want the NT kernel to end up like Linux, after all :-)

I think you're making a joke, but it still doesn't make sense. Linux does avoid breaking changes to its userspace ABI

Linux kernel's ABI/API surface is completely byte-based and is tiny compared to Win32 API. The stable ABI of Windows is strictly in the user space and it covers the entire useful operating system. Don't forget that glibc isn't that ABI stable nor anything that goes all the way up to systemd/X11/gettext/Wayland/cairo/GTK/Qt/glade/Pipewire/xdg. Nothing equivalent in Linux environment is stable, especially compared against Win32 system libraries.

You encounter encoding requirements not in kernel system calls but in more user-facing sides of the OS. So your comparison should include all the userspace components of a Linux system (e.g. gettext), if you're considering the places where you encounter actual Unicode string-based operations where UTF-16 comes into play in Windows.

ALL OF THE equivalent Windows libraries (user32.dll, kernel32.dll, ws2_32.dll, shellex.dll ...) to the Linux ones I counted above and more are ABI and API stable.

See, I was sympathetic to that view, except that they specifically posted

> We don't want the NT kernel to end up like Linux, after all

(Emphasis mine)

And that's... The one part of the OS where Linux is ABI stable.

I think they didn't mean Linux as in the kernel sense but in the overall "Family of OSes and ecosystem with very unstable APIs" sense.
Additional Detail: it is specifically utf-16 little endian when a byte order mark is not used, which is the opposite of the recommended choice of big endian in the RFC.

Worse are the byte order marks required to support both endians that end up in files.

The development of Windows NT based on UCS-2 precedes the RFC by roughly a decade, and little-endian was the natural choice for the Intel PC platform. Obviously the endianness had to remain the same when UCS-2 was extended to support UTF-16.
Also true. The BOMs though are annoying.
UTF-16 is also used by C#, Java, and JavaScript. Since JavaScript is so widely adopted, I wouldn't call it a rare bird. Not necessarily used when reading or writing files, but it's what's used internally for the strings. As a result, your strings use UTF-16 surrogate pairs to represent characters outside of the basic multilingual plane (such as Emoji).
UTF-16 is the internal format of the ICU library (International Components for Unicode, the support library from the Unicode standards people) which is a common way to add "full fat" Unicode support to a programming language. This has knock-on effects everywhere. If you're using ICU, you either use UTF-16, too, or you constantly convert back and forth every time you interact with ICU. You're often best off using UTF-16 in memory and only converting to UTF-8 when you write files or transmit over the network.
> Windows is also a rare bird in UTF-16.

Web browsers use UTF-16 internally. So Windows isn't even largest "platform" that uses UTF-16.

> Windows is also a rare bird in UTF-16.

an interesting tidbit, some Windows kernel developer realized that most registry keys are ascii anyways so they could save up to 50% space simply by storing the name as ascii. The flag is called "compressed name" and they will pad with 0x00 when reading the name to make a proper utf-16 string.

No worries, eventually Xenix will make a comeback.

You cannot ditch CRLF, Microsoft isn't Apple.

Windows accepts backslashes and forward slashes, only old applications that manually search for one of them get it wrong.

> Or better yet, just switch everything to full POSIX.

Really not possible as most of POSIX semantics arise naturally from the kernel (or are enforced/executed at the kernel level). Windows technically provides some of them (or semantic equivalents) so you could make something work, but in order to do a full port you'd need to strip out too many concepts for it to be worthwhile. For instance the idea that "everything is a file" or the single root filesystem layout (which iirc is segmented deeply at the kernel level).

> Really not possible as most of POSIX semantics arise naturally from the kernel (or are enforced/executed at the kernel level).

The kernels undoubtedly take different approaches. But there's nothing in NT that strictly prohibits POSIX compatibility layers. As we see with the many compatibility layers that have existed for Windows over the years.

> For instance the idea that "everything is a file"

POSIX doesn't have a concept of "everything is a file". That's Plan 9. UNIX and POSIX actually made numerous concessions here and there are plenty of constructs that are not exposed as a file.

Windows does already abstract a few primitives as files too. In fact even DOS has the concept of device files, though in typical Microsoft fashion, it's implementation was a clusterfuck that took MS 40 years to fix.

> or the single root filesystem layout (which iirc is segmented deeply at the kernel level).

NT uses an object system for filesystem objects, but it still has a root. As you can see in the WinObj (eg screenshot below)

https://learn.microsoft.com/en-us/sysinternals/downloads/med...

The C:\ convention is really more there for compatibility with DOS-lineage software but the underlying filesystem APIs work fine with NT objects.

For example, C:\foo.txt might be equivalent to \Device\HarddiskVolume3\foo.txt

In this regard, it's similar to how POSIX has a database of inodes and the filesystem hierarchy is actually just an abstraction that sits on top of that

> ... Or better yet, just switch everything to full POSIX.

Interix[0] did a pretty good job of this, but MSFT killed it. I was compiling GNU tools w/ GCC and running bash under Interix back in in 2000 under Windows 2000. It was grand.

[0] https://en.wikipedia.org/wiki/Interix

I ported a million-line, OpenGL/Motif application using Interix (before it was acquired). Indeed, it was grand.
The experience with Windows Services for UNIX wasn't that great, so yet another good product killed via acquisitions?

I only know it post acquisition.

They might as well create a new OS with a different name, because none of the existing applications will work, and no enterprise customer will use it.
Most (everything?) on Windows actually works with forward slashes. However, much of the tooling will overwrite your version with a backslash wherever it can.
At which layer? dir /w, for example, is never going to list the contents of \w.
All the layers, except applications where devs still look for the wrong slash manually.

OSes and modern programming languages have APIs for file navigation for a reason.

On Windows this implies old applications from another era, good old command window, or people that don't know better.

The kernel and ntfs does not care about slash direction. Specific programs like dir might though and honestly if you're on windows just use powershell and avoid legacy cmd stuff.
Additionally on programming languages use the apis for path management, instead of explicit looking for the \ or / characters.
Use powershell.

Get-ChildItem /w

Too many keystrokes, people won't use it.
gci /w than?
ls /w also works
Aliases
Don't think this will ever happen, especially since this is Microsoft you are talking about [0]

[0] https://www.youtube.com/watch?v=bC6tngl0PTI

On Windows, I've mostly avoided CRLF by configuring text editors and git to use LF, and writing text files in binary mode.

The only places that still forced CRLF were batch files and clipboard.

> I've mostly avoided CRLF by configuring text editors and git to use LF,

That has been my experience as well. I can't remember the last time I had an issue related to CRLF.

The only times I can remember having line-ending issues is using GNU's tools on Linux. Every Windows tool I can remember using accepts both CRLF and LF.
Yesterday for me was the last time. Visual studio 2026 default to crlf I think maybe and I have autoctlf in git turned off. I should probably turn that back on.
I think you want a .gitattributes file where you set the EOL of the file types you care about. And you might want to try autocrlf=input.
Thanks for this :)
does visual studio support editorconfig.org? I use end_of_line = lf for all my projects
A lot of tooling that generates or transform text-like files (e.g. XML) outputs native line endings, which in case of Windows is CRLF. Depending on what you do, it’s almost impossible to avoid that.
I have absolutely zero sympathy for any tool that is incapable of handling \r\n and only works with \n. Literally absolutely no sympathy.

All software accumulates warts over time. Linux is overflowing with horrible warts and tech debt. As is any software that has successfully served customers for decades.

But multiple line endings are quite possibly the easiest most trivial thing to support and there is absolutely no negative cost of any kind in doing so. Linux ecosystem chooses to be stubborn and provide a strictly worse user experience out of pure spite and for zero user benefit. It’s very irritating.

> But line endings are quite possibly the easiest most trivial thing to support and there is absolutely no negative cost of any kind in doing so. Linux ecosystem chooses to be stubborn and provide a strictly worse user experience out of pure spite and for zero user benefit. It’s very irritating.

The Linux ecosystem handles it fine (by using a single standard). Windows doesn't. That's its problem.

> The Windows ecosystem handles it fine (by using a single standard). Linux doesn't. That's its problem.

It's always funny to see how the fanbois treat their way as the one and only 'True Way'.

I didn't say it was the one and only True Way. My intended meaning - which I admit I may have poorly conveyed - is that tools from the unix ecosystem are intended to work on unix conventions, and do, and that works. Windows has different standards, which is also fine, but it follows that you shouldn't expect unix tools to follow Windows standards even if you make them run on Windows. This is like getting Windows software to run under WINE and then complaining that it doesn't use /n newlines and that it should change to accommodate Linux (or whatever). No, a Windows program will follow Windows standards even when made to run on a unix-like. And in the same way, unix-family software is going to follow unix standards even on Windows.
Your entire framing is wrong imho.

Multi-platform is very easy and a solved problem if you try juuuust a tiny amount.

For example the Rust stdlib iterator for lines() handles both conventions. It just works. Very easy.

I live in a cross-platform world. Line endings in text files should not be a breaking problem because some CLI tool refuses to support both. That’s just plain bad software engineering.

I expect Unix tools that process text files to be capable of supporting text files that have different conventions. This is very easy. Refer to previous comments on stubbornness out of spite.

Even Python has str.splitlines, which accepts a range of line separators (not just LF and CRLF), and the same when iterating over a file handle (which iterates over lines).

You're right, this is a complete nonissue.

UNIX and Windows aren't the only OSes on the planet, even though UNIX folks tend to assume that.

Heck many mistake UNIX with GNU/Linux even, and then complain when given a UNIX that isn't a Linux distro.

I think the point is that line endings are a really, really stupid hill for either Linux or Windows to die on. The day when any programs should have cared about line endings came and went decades ago.
Complete and utter nonsense. Every Windows tool I remember using has handled LF-only endings perfectly fine, meanwhile Linux tools regularly fail to handle CRLF endings.
> Every Windows tool I remember using has handled LF-only endings perfectly fine

cough notepad cough

Comments like this really make me wish HN allowed you to block users. Alas.
If you have something to say, say it. This is barely even an ad hominem.
"Windows really needs to ditch CRLF and just use LF, and switch from backslashes to forward slashes."

Hahahahaha. That's hillarious.

Oh god, you're serious?

Do you have any idea how much of Windows, and user software would break? Any idea at all?

You really want MS, who has built backwards compatibility as a core feature of Windows, to break countless thousands of pieces of software that run on it?

I'm sure there's some idealized fantasy in which that change gets wrapped in a neat little abstraction that prevents anything from breaking. I promise you, there is no way of encapsulating or abstracting that change that works for everyone.

If I could wave a magic wand and make it so without breaking it, I would. But it's a fantasy.

> break countless thousands of pieces of software

I'd estimate millions

>> Windows really needs to ditch CRLF and just use LF, and switch from backslashes to forward slashes.

{To the extent such stuff is pragmatic:} I think we should switch to Pascal-style strings everywhere, and then have no need for having special-purpose characters like path separators (a path now being a list of strings).

Windows has been doing for quite a while, that is how COM strings work.
the two filesystems can be a super power... i seamlessly use the same driver between wsl2 and my dual booted opensuse.
Yeah I don't mind/like the two file systems. Looks like MS is taking it further too they also announced WSL Containers & API.
honestly your point is a bit weird.

powershell is good. its much better than unix's everything piped is Text idea. godawfull that. outputs being objects is a really solid take.

WSL is trash.

besides that, lf vs. crlf is silly as you mention but crlf is more logical considering what its implementing. that being said the notion of these control chars is already based on outdated and limited ideas.

if you want a consistent system to do things with dont pick a system which tries to be two systems.

Linux has wine. Windows has WSL.

I'd recommend BSD. any flavor will do.

might take some adjustments but you will have a more 'rational' system if that is what you desire.

(otherwise, embrace the madness!)

> outputs being objects is a really solid take.

Glad I'm not alone here ha!

Being able to go someoutput | Format-Table | Select ColumnName,ColumnName,CloumnName is great. Beats memorizing the output format of any specific command and trying to wrangle it with awk.

I'm a big Linux advocate with limited experience on modern versions of Windows, but PowerShell objects are great. So is the Unix way of doing text. I think the strengths of each approach are in different use cases. Unix style is better for interactive usage because it's fast, I can type df -h | grep /home very quickly. Object output is better for scripts that can, thanks to objects, store and operate on sensible data while Bash scripts do a lot of ad-hoc data extraction/reformatting with string expansion, awk and whatever else to get data to the next step in the script.
Inspired by the ways of Xerox, Genera and ETHZ.

It can be verbose, but the object approach, and having .NET, DLLs and COM as first class primitives is quite flexible.

Naturally you can have a similar experience on UNIX, e.g. fish, but very few do it.

You can skip the `| Select` btw.

Just do `someoutput | ft ColumnName,ColumnName,CloumnName`

maybe windows should actually just become linux
No, they need to ditch drive letters first. The NT kernel and NTFS don't even require them (I used to mount disks without drive letters back in the NT 4 era). They just don't care enough to get rid of this annoyance.
Nobody wants to use \??\GLOBALROOT\Device\HarddiskVolume3\ in their paths.
Nonsense. You can mount filesystems to mount points in much the same way as is done in Unix. No one would ever need to do that.
You can indeed use mount points like C:\mountdir, but that's still on the C drive, which is a drive letter. It's not "no drive letters".
And if \ was an alias for C:\ this would just be \mountdir.
users , especially non-technical, find it highly useful in my experience. Is it a net positive to get rid of them, or will it largely only make developers happier ?
It’s arcane and technical for no reason. /Users/ME/Documents, /Media/MyThumbDrive/…, etc. are much clearer and less confusing than C:\…
At the very least, drive letters do make SMB shares a bit simpler for the non technical folks. T:\MyData is easier for them than \\0010-somehost-win.site1.mycorp.loca\Share01\MyData\

I used to support a group of completely tech illiterate users in construction & manufacturing. Them figuring out T:\ was hard enough, ask them to type in a UNC path into the address bar in explorer and you get "Wtf is file explorer? Wtf is an address bar? Where is the backslash key??"

Then in this hypothetical world you could mount \\0010-somehost.win.site1.mycorp.local\Share01 to /Share01 rather than T:
Or just use sane names like \\MyDivision\Share01\MyData and mount that to \Network\Share01 or some such.
>Windows really needs to ditch CRLF...

Windows needs to ditch itself.