Nice writeup. I suspect you're measuring the cost of abstraction. Specifically, routines that can handle lots of things (like locale based strings and utf8 character) have more things to do before they can produce results. This was something I ran into head on at Sun when we did the I18N[1] project.
In my experience there was a direct correlation between the number of different environments where a program would "just work" and its speed. The original UNIX ls(1) which had maximum sized filenames, no pesky characters allowed, all representable by 7-bit ASCII characters, and only the 12 bits of meta data that God intended[2] was really quite fast. You add things like a VFS which is mapping the source file system into the parameters of the "expected" file system that adds delay. You're mapping different character sets? adds delay. Colors for the display? Adds delay. Small costs that add up.
1: The first time I saw a long word like 'internationalization' reduced to first and last letter and the count of letters in between :-).
2: Those being Read, Write, and eXecute for user, group, and other, setuid, setgid, and 'sticky' :-)
I did not mean to imply I invented it, that was just the first time I had seen that form of shorthand. Presumably whomever at Sun who called it that was no doubt aware of other efforts, I was but a lowly entry level programmer in the systems group. That was long before I had acquired the power to name things at Sun :-)
How much of the speedup over GNU ls is due to lacking localization features? Your results table is pretty much consistent with my local observations: in a dir with 13k files, `ls -al` needs 33ms. But 25% of that time is spent by libc in `strcoll`. Under `LC_ALL=C` it takes just 27ms, which is getting closer to the time of your program.
I didn't include `busybox` in my initial table, so it isn't on the blog post but the repo has the data...but I am 99% sure busybox does not have locale support, so I think GNU ls without locale support would probably be closer to busybox.
Locales also bring in a lot more complicated sorting - so that could be a factor also.
I'm curious how lsr compares to bfs -ls for example. bfs only uses io_uring when multiple threads are enabled, but maybe it's worth using it even for bfs -j1
Oh that's cool. `find` is another tool I thought could benefit from io_uring like `ls`. I think it's definitely worth enabling io_uring for single threaded applications for the batching benefit. The kernel will still spin up a thread pool to get the work done concurrently, but you don't have to manage that in your codebase.
Curious: Why? Is it not a good fit for what ripgrep does? Isn't the sort of "streaming" "line at a time" I/O that ripgrep does a good fit for async io?
For many workloads, ripgrep spends the vast majority of its time searching through files.
But more practically, it would be a terror to implement. ripgrep is built on top of platform specific standard file system APIs. io_uring would mean a whole heap of code to work with a different syscall pattern in addition to the existing code pattern for non-Linux targets.
So to even figure out whether it would be worth doing that, you would need to do a whole bunch of work just to test it. And because of my first point above, there is a hard limit on how much of an impact it could even theoretically have.
Where I would expect this to help is to batch syscalls during directory tree traversal. But I have nonidea how much it would help, if at all.
At those time scales, you would be better off using `tim` ( https://github.com/c-blake/bu/blob/main/doc/tim.md ) than hyperfine { and not just because that is your name! Lol. That is just a happy coincidence by clipping one letter off of the word "time". :-) } even though being in Nim might make it more of a challenge.
This is fantastic stuff. I'm doing a C++ project right now that I'm doing with an eye to eventual migration in whole or in part to Zig. My little `libevring` thing is pretty young and I'd be very open to replacing it with `ourio`.
What's your feeling on having C/C++ bindings in the project as a Zig migration path for such projects?
In my experience there was a direct correlation between the number of different environments where a program would "just work" and its speed. The original UNIX ls(1) which had maximum sized filenames, no pesky characters allowed, all representable by 7-bit ASCII characters, and only the 12 bits of meta data that God intended[2] was really quite fast. You add things like a VFS which is mapping the source file system into the parameters of the "expected" file system that adds delay. You're mapping different character sets? adds delay. Colors for the display? Adds delay. Small costs that add up.
1: The first time I saw a long word like 'internationalization' reduced to first and last letter and the count of letters in between :-).
2: Those being Read, Write, and eXecute for user, group, and other, setuid, setgid, and 'sticky' :-)