Hacker News new | ask | show | jobs
by james412 2115 days ago
How would one go about implementing case-insensitive path lookup against a SMB share containing a few million files from userspace?

It must be in the kernel. Else implementations of stuff with human-derived semantics should move out of the kernel, but moving SMB into userspace would of course be ridiculous

So it complicates any mapping of filenames to data structures in the kernel (all 3 of them?), big deal. Every popular desktop operating system supports it, and basically the only reason Linux does not is a mixture of FUD and fear that we may need to update the code at some point due to changes in human culture, the horror!

Meanwhile, typing "cd music" in a terminal need not print "Command not found" when there is clearly a folder named "Music", like the $3k worth of gear in front of me had the complexity of some 1950s b-movie scifi robot

2 comments

> How would one go about implementing case-insensitive path lookup against a SMB share containing a few million files from userspace? > > It must be in the kernel. Else implementations of stuff with human-derived semantics should move out of the kernel, but moving SMB into userspace would of course be ridiculous

Case-insensitive path lookups on SMB happens in the server (usually samba), in user-space. The client is also usually in user-space through FUSE or the client libs, but CIFS of course exists as a kernel-mode alternative.

And as I have written elsewhere, sure, "music" vs. "Music" is simple when you live in an ASCII world. Trying to be smart with user input only causes trouble for the rest of us. Hiragana and katakana is also logically the same, and on a kana keyboard a similar typo. Simplified and traditional chinese is also logically the same.

> Every popular desktop operating system supports it

Doesn't mean we should break stuff here as well.

> How would one go about implementing case-insensitive path lookup against a SMB share containing a > few million files from userspace? It must be in the kernel.

SMB in the kernel is a rather dangerous game. Not saying it can't be done (and I know Samsung is doing it :-), but it's a significantly more complex protocol than even NFSv4 (which also really shouldn't be in the kernel either). For complexities sake, userspace is easier IMHO (much easier to debug).

Also, you seem to be of the opinion that code in the kernel must be "magic", in that it can do things that user-space can't. The "missing case" lookup problem still exists for kernel code as it does for user-space code.

Looking up "foo" in all case variations in a file containing a few million files still has to be done by search in the kernel as it does in user-space. It's going to be slow there too unless you provide a case-insensitive indexing mechanism.

Now the kernel offers easier opportunities to do things like directory content caching, which currently aren't exposed to user space via any API - but once you have to do something like directory content caching it's also possible to expose that feature to userspace via an API.

> Else implementations of stuff with human-derived semantics should move out of the kernel, but moving SMB into userspace would of course be ridiculous

Samba would beg to disagree of course :-).