Hacker News new | ask | show | jobs
by burntsushi 1595 days ago
How did you implement gitignore filtering?
1 comments

Very poorly.

Supporting ignore files in the root of the search is trivial.

However supporting files deeper is much harder, since find does not support this.

For the case where few directories have ignore files, it's best to find all the ignore files first and generate one find command for each.

If most directories have them then it's faster to exec find for each directory, and use maxdepth 1

In terms of me just using find by itself, I don't bother, since I find silently ignoring files to be a misfeature, and I don't use git at work anyways

I see. I'm not even sure how implementing ignore files in the root is trivial without, say, using 'git' directly. The gitignore format is quite subtle.

In any case, that's all fine and good, but your tool is very clearly not a clone of ag or ripgrep IMO. Whether you consider it a misfeature or not, the "smart filtering" aspect is probably the defining quality of things like ack, ag and ripgrep. So if you don't have that, I don't think you can call it a clone IMO. The smart filtering feature is right up there next to performance in terms what things users tend to like about these tools.

The other thing you're probably missing from a perf perspective, I think, is parallel directory traversal. Neither ack nor ag have this, but ripgrep does. ;-)

> The gitignore format is quite subtle.

It is indeed. I can't say that I have it 100% correct yet, if you are aware of a good set of tests, I could incorporate it.

> In any case, that's all fine and good, but your tool is very clearly not a clone of ag or ripgrep IMO. ... the "smart filtering" aspect is probably the defining quality of things like ack, ag and ripgrep. So if you don't have that, I don't think you can call it a clone IMO. The smart filtering feature is right up there next to performance in terms what things users tend to like about these tools.

I did implement nested gitignore files; it just murders performance (not quite by a factor of 2 with warm caches in the best case; some pathological cases can be much worse). This is actually something that logically belongs in find, as it relates to directory traversal, and a generic "walk this directory tree ignoring files in gitignore" is probably useful for many other things.

This was strictly a clone of the silver searcher because that was what was popular at the time (2016ish I think?). It started out as a weekend project just for fun, then I started adding more ag options. There are a few other unimplemented features such as --vimgrep --ackmate --column and --stats. Also -g takes a glob rather than a pcre; I could reasonably make it take a posix regex, but I think that would fool people into thinking it worked like ag, while a glob is rarely confused for a pcre.

Gotya. Fair enough!