Hacker News new | ask | show | jobs
by lelanthran 909 days ago
Firstly, I appreciate you taking the time to engage with me. I hope that I didn't come off as dismissive of your hard work or of being disrespectful of what you have delivered.

My point was that the incentive to produce something like `Everything` on Linux just isn't aligned with what the target market wants or needs. I think that what you have produced satisfies what the target market wants.

> You can easily make $100 in donations with this.

Honestly, I'm still very skeptical that even a $100 target is possible. I have to also admit that I've looked at stuff in the past, gone "No one could possibly want that, at that price point" and been horribly wrong.

I feel like I should test the claim of how many people want an `Everything` equivalent on Linux: I'll make it, package it with a MVP GUI, and mention it on a few forums in addition to posting a show HN here.

For ideal reproducibility, let me know which forum(s) you initially got traction on. I'll try to mirror your marketing as closely as possible.

I'd also like to know how you went about benchmarking performance against existing stuff for your project; for comparison against `Everything` I was thinking that the metric to beat is delta between file creation/removal time and the time that the file shows up in the results set (or index).

Like the other responder here, I also think that once something is in the index, retrieval time should be almost instant, so there's not much point in benchmarking "How long does it take to update results after every keypress" once that metric falls below 100ms or so.

1 comments

> I hope that I didn't come off as dismissive of your hard work or of being disrespectful of what you have delivered.

Not at all, I'm just incredibly curious of how you'd solve the issue of creating an index of a filesystem as fast as Everything, because I've thought and read a lot about it in the last couple of years and haven't found any solution at all, nor did I find any other software which achieved something like that on Linux systems.

> For ideal reproducibility, let me know which forum(s) you initially got traction on. I'll try to mirror your marketing as closely as possible.

One post on the Arch Linux forum and one on the r/linux sub on Reddit. From there I got enough users to get more than 100$ in donations. Nowadays it's obviously more.

> I'd also like to know how you went about benchmarking performance against existing stuff for your project;

Everything has an extensive debug mode with detailed performance information about pretty much everything it's doing. That's how I know exactly how long it took to create the index, perform a specific search, update the index with x file creations, deletions or metadata changes etc.

> for comparison against `Everything` I was thinking that the metric to beat is delta between file creation/removal time and the time that the file shows up in the results set (or index).

That's not particularly interesting, because it's quite straight forward to achieve a similar performance.

The crucial metric is how long it initially takes to create the index and then update it when the application starts (i.e. finding all changes to the filesystem which happened while the application wasn't running). That's where Everything excels and to which I and others haven't found a solution on non-Windows systems (without making significant changes to the kernel of course). The best and pretty much only solution I'm aware of is the brute force method of walking the filesystem and calling stat, which obviously is much slower.

> The crucial metric is how long it initially takes to create the index and then update it when the application starts (i.e. finding all changes to the filesystem which happened while the application wasn't running)

That's what I meant by " delta between file creation/removal time and the time that the file shows up in the results set (or index)."

Basically, how fast can we update the index?

> That's where Everything excels and to which I and others haven't found a solution on non-Windows systems (without making significant changes to the kernel of course).

I've got a couple of out-there ideas which may or may not pan out, one of which was, indeed, a kernel module.

Another idea is to deploy the indexer as a daemon with the applications all using IPC to query and update it. This will give the query applications a significant advantage on startup compared to Everything.

As for updating the index timeously, I've got a few ideas there as well. Walking the filesystem starting at `/` for each update will result in only performing index updates once a day or so (hence, the reason I expressed the metric as a delta) so I feel that that is no good.

I'll do an implementation and try to message you (if you want to check it out) because code talks louder than words :-)

> Basically, how fast can we update the index?

The two core issues are:

1) How do you quickly get a list of all files and their attributes from the filesystem, without recursively visiting all directories? The kernel has no such functionality and neither do most filesystems (except NTFS with the MFT, which is how Everything solves that).

2) How do you know which files have been modified on a filesystem since it was last mounted on the system or since your monitoring daemon/application was running the last time? This information also needs to be stored persistently on the filesystem (like the USN journal, which Everything is using) if you want to avoid slow recursive traversals.

> I've got a couple of out-there ideas which may or may not pan out, one of which was, indeed, a kernel module.

Well the problem is, my kernel isn't the only kernel who changes the filesystems I'm using. Hence a kernel module only works if your system is the only one whose modifying the data you're working with or most other systems need to be using the same kernel module, which isn't realistic.

> Another idea is to deploy the indexer as a daemon with the applications all using IPC to query and update it. This will give the query applications a significant advantage on startup compared to Everything.

Everything uses a daemon as well and it's not a solution to that issue, because somehow the daemon also has to get the list of files/folders and their attributes out of a filesystem without walking it. How else would the daemon know which files belong to the volume which was just mounted moments ago?

> As for updating the index timeously, I've got a few ideas there as well. Walking the filesystem starting at `/` for each update will result in only performing index updates once a day or so (hence, the reason I expressed the metric as a delta) so I feel that that is no good.

Walking the filesystem shouldn't be done at all, because it's just too slow.

> I'll do an implementation and try to message you (if you want to check it out) because code talks louder than words :-)

Of course, I'd appreciate that.

> How else would the daemon know which files belong to the volume which was just mounted moments ago?

I wasn't intending to include transient filesystems in the index.

> Of course, I'd appreciate that.

Gimme about a week :-)

> I wasn't intending to include transient filesystems in the index.

There's absolutely no difference between transient and persistent filesystems in regards to that problem. Every time a filesystem gets mounted, you have no idea what you're going to get. The last time it was mounted there could have been 13 million files on it and now when you mount it all of them could be gone or renamed. This is also super common on modern Linux systems, because many of them boot into a minimal boot environment to perform system updates and hence alter the filesystem heavily while such daemons as a file system monitor isn't running.

So the question is: how do you know, whether /some/random/file has been modified while your daemon or application wasn't running or the filesystem wasn't mounted on your system, without performing a stat call on it? If you don't have an answer to that, which also needs to be orders of magnitudes faster, then you'll never match the performance of Everything. And that's not some uncommon situation, because your daemon/app has to figure that out every time it gets launched for every file and folder.

> So the question is: how do you know, whether /some/random/file has been modified while your daemon or application wasn't running or the filesystem wasn't mounted on your system, without performing a stat call on it? If you don't have an answer to that, which also needs to be orders of magnitudes faster, then you'll never match the performance of Everything.

Well, my intention is to match the feature list of Everything, but on Linux, and as far as I knew, Everything did not have full support for external drives - you'd have to convert them to NTFS, or add them to be indexed manually.

The use-case I've seen for Everything has always been for a local user searching their local PC; I wasn't even sure until now that Everything can sometimes search transient filesystems because know one I ever saw using it used it for files on a transient filesystem.

You're correct; what I cannot do is monitor transient filesystems; but doing permanent filesystems at a speed better than or equal to Everything is still better than anything I've used on Linux, many of which don't even search system files, nevermind transient filesystems. And they all use the locate db which is always a day or so out of date.

And yes, it can be done purely by monitoring filesystem changes. Sure, a full index needs to be built the first time, but that's a one-off cost - index updates after that should be fast enough to do for each write/remove/move operation that you can update the index dozens of times per second.

For non-transient filesystems, performance should be the same as, or better than, Everything.