| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CyberDildonics 955 days ago

This is simply done by walking the filesystem.

This is the part I'm wondering about. Everything scans the filesystem very fast and there is no way it is just using 'stat' on every file then diving into the directories.

Are you just using stat from C to walk the filesystem or are you doing something else?

I've used sqlite to cache filesystem results and it is also extremely fast once everything is in there, but I think a lot of approaches should work once the file attributes are cached.

3 comments

soundarana 955 days ago

On NTFS Everything reads the MFT, which is sequential on disk.

Then on subsequent starts it reads the NFTS update journal to see what changed.

link

lelanthran 954 days ago

> Everything scans the filesystem very fast and there is no way it is just using 'stat' on every file then diving into the directories.

The last time I checked, Everything worked by using the AV calls microsoft provides; anytime a file is written, the name (and other metadata) can be written to a log that Everything can check once every 5 seconds or so.

If I thought there was any money at all to be made from providing an Everything equivalent[1] on Linux, I'd spend the week or so to write it, but as far as I can tell there's just no market for something like this.

[1] By that I mean "similar in performance and query capabilities"; I would obviously need more time than that to hook into the common file-open dialog widgets (Gnome/KDE/etc) so that users could run their queries straight from existing file dialog widgets.

link

CyberDildonics 954 days ago

What you are talking about is file change notifications. A huge part of Everything's speed comes from reading the master file table that other people mentioned, so you would need a way to quickly read file table entries on linux.

https://learn.microsoft.com/en-us/windows/win32/devnotes/mas...

link

lelanthran 954 days ago

> What you are talking about is file change notifications. A huge part of Everything's speed comes from reading the master file table that other people mentioned, so you would need a way to quickly read file table entries on linux.

Not a problem. And no, I'm not talking about inotify either, and I'll additionally index the contents of (text) files as well with a negligible additional performance hit. It can be done as fast as, or faster than, `Everything`.

TBH, if I thought I could make even $100 in donations from this, I'd start it tomorrow, but absolutely no one misses ultra-fast searching when they don't have it.

Even on Windows, the number of users who go out and look for something that searches as fast as Everything is a rounding error - statistical noise. Now go and divide that fractional percentage of Everything users on Windows by 100 to get the number of Linux users who might use this.

link

wander_homer 954 days ago

> Not a problem. And no, I'm not talking about inotify either, and I'll additionally index the contents of (text) files as well with a negligible additional performance hit. It can be done as fast as, or faster than, `Everything`.

Please enlighten us how that would work.

> TBH, if I thought I could make even $100 in donations from this, I'd start it tomorrow, but absolutely no one misses ultra-fast searching when they don't have it.

You can easily make $100 in donations with this. I did it with this piece of software while it was still less performant and powerful and without an official release and by only mentioning it on one or two forums.

If the software delivers what you're saying, I'll guarantee you, that this will lead to more than 100$ per month in donations.

link

lelanthran 953 days ago

Firstly, I appreciate you taking the time to engage with me. I hope that I didn't come off as dismissive of your hard work or of being disrespectful of what you have delivered.

My point was that the incentive to produce something like `Everything` on Linux just isn't aligned with what the target market wants or needs. I think that what you have produced satisfies what the target market wants.

> You can easily make $100 in donations with this.

Honestly, I'm still very skeptical that even a $100 target is possible. I have to also admit that I've looked at stuff in the past, gone "No one could possibly want that, at that price point" and been horribly wrong.

I feel like I should test the claim of how many people want an `Everything` equivalent on Linux: I'll make it, package it with a MVP GUI, and mention it on a few forums in addition to posting a show HN here.

For ideal reproducibility, let me know which forum(s) you initially got traction on. I'll try to mirror your marketing as closely as possible.

I'd also like to know how you went about benchmarking performance against existing stuff for your project; for comparison against `Everything` I was thinking that the metric to beat is delta between file creation/removal time and the time that the file shows up in the results set (or index).

Like the other responder here, I also think that once something is in the index, retrieval time should be almost instant, so there's not much point in benchmarking "How long does it take to update results after every keypress" once that metric falls below 100ms or so.

link

wander_homer 953 days ago

> I hope that I didn't come off as dismissive of your hard work or of being disrespectful of what you have delivered.

Not at all, I'm just incredibly curious of how you'd solve the issue of creating an index of a filesystem as fast as Everything, because I've thought and read a lot about it in the last couple of years and haven't found any solution at all, nor did I find any other software which achieved something like that on Linux systems.

> For ideal reproducibility, let me know which forum(s) you initially got traction on. I'll try to mirror your marketing as closely as possible.

One post on the Arch Linux forum and one on the r/linux sub on Reddit. From there I got enough users to get more than 100$ in donations. Nowadays it's obviously more.

> I'd also like to know how you went about benchmarking performance against existing stuff for your project;

Everything has an extensive debug mode with detailed performance information about pretty much everything it's doing. That's how I know exactly how long it took to create the index, perform a specific search, update the index with x file creations, deletions or metadata changes etc.

> for comparison against `Everything` I was thinking that the metric to beat is delta between file creation/removal time and the time that the file shows up in the results set (or index).

That's not particularly interesting, because it's quite straight forward to achieve a similar performance.

The crucial metric is how long it initially takes to create the index and then update it when the application starts (i.e. finding all changes to the filesystem which happened while the application wasn't running). That's where Everything excels and to which I and others haven't found a solution on non-Windows systems (without making significant changes to the kernel of course). The best and pretty much only solution I'm aware of is the brute force method of walking the filesystem and calling stat, which obviously is much slower.

link

CyberDildonics 954 days ago

It can be done as fast as, or faster than, `Everything`.

Then how would you do it? That's what I'm asking, how would you get the file attributes off of the disk as fast as everything on linux? Once you get them off the disk any modern computer can burn through them, but getting that data into memory in the first place is the problem.

link

wander_homer 955 days ago

Yes, it's simply using stat on every file/folder. There's probably some room of improvement there with clever parallelization, but it'll remain a bottleneck.

Everything is parsing a file called the MFT to build its index. This much more efficient but unfortunately this file only present on NTFS volumes, which makes it super useful on Windows systems, but not so much everywhere else.

Another benefit you get on Windows is the USN journal, which allows Everything to keep the index updated much more efficiently.

link