| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vidarh 1753 days ago

Or just use "find -maxdepth 1" or "ls -f"

What makes "ls" appear slow on large directories is that it loads the entire directory to sort the contents. "-f" turns off that behaviour (it also changes some other switches; EDIT: another relevant change might be that ls by default also outputs columns that needs to know length of filenames), and "find" does not sort.

(Don't want to fault OP for writing something to do this - it's not obvious if you've suddenly had to deal with a huge directory and aren't quite familiar with trying find a way to list their contents faster - I remember my frustration over how slow that could be myself until I learned of those options, especially on ext2fs where it was really awful)

3 comments

saddlerustle 1753 days ago

ls from gnu coreutils uses readdir(3), which only populates a small buffer of dirents inside glibc from the result of getdents. So running ls in any way on a very large directory runs many syscalls and calls into the vfs, which will be extremely slow if you're using FUSE or a network filesystem

link

vidarh 1753 days ago

It uses a 32K buffer, sure (at least that's what strace shows me on my machine; EDIT: glibc dynamically chooses buffer size in opendir() based on reported st_blksize, up to 1MB)

But I've run ls on systems with several orders of magnitude slower drives than what we're used to today, and it really isn't generally what causes people to complain of ls being slow on large directories in practice.

Even with ext2fs which was notoriously bad at handling large directories, exported via NFS filesystems over 10Mbs ethernet backed by 90's era IDE drives, the buffer size was rarely enough of a problem to matter once you turned off the sorting.

(To be clear, I'm not saying it won't ever make a difference; but in practice, over decades of running into this complaint, it's just not been the issue - most of the time peoples problem is interactive use where they run into having to wait for ls to read the whole dir to start getting results, not total throughput)

link

saddlerustle 1753 days ago

I mean, I only say this bc I wrote a tool like OP's back in the day and got a ~5x speedup over `ls -f` in a particular environment, but sure YMMV.

link

vidarh 1753 days ago

I edited my comment. Current glibc will dynamically choose buffer sizes up to 1MB of dirent structures depending on reported st_blksize. No idea when it started doing that, so it might well be it does better now.

I'm sure there are circumstances where a 5x speedup will matter, but most of the time where people use "ls" it's interactively on the command line and it's just not been my experience that it matters in the circumstances where I've had to help people with this. It's more of a "we're right in the middle of something and can't get any results back" kind of situation. I'd say 9 out of 10 times I've had people bring up an issue like this it's because they're uncomfortable with "find" and are trying to get a list of files to do something with that they could easily do directly with "find" and/or where "ls -f" is more than sufficient.

But as you say, YMMV.

link

fragmede 1753 days ago

It used to be a fixed 4k buffer. Later versions of ls (coreutils) raised that, and made it dynamic.

link

CyberShadow 1753 days ago

Yes. You may also find that `ls` is slower than `ls|cat`, because the latter needs to call `stat` for every directory entry to colorize it.

link

vidarh 1753 days ago

getdents() on Linux (since 2.6.4) includes a file type (block/character device, directory, pipe, symlink, file, unix domain socket) in the dirent structure directly, so that shouldn't be necessary - certainly ls on my machine just calls getdents() and still produces the type indication.

(also "-f" turns off the colors and type indication in any case, making it a more portable way of ensuring you get fast output even on systems which doesn't embed the type in getdents() results)

link

publicarray 1751 days ago

Instead of "find -maxdepth 1" there is "fd -I -d 1" which in my experience is faster as it uses multiple threads https://crates.io/crates/fd-find

link