Hacker News new | ask | show | jobs
by debacle 1341 days ago
What is ripgrep?

Edit: Because I'm on a Zoom call that will never end.

"ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will respect gitignore rules and automatically skip hidden files/directories and binary files."

https://github.com/BurntSushi/ripgrep

2 comments

A grep alternative that optimizes for performance: https://github.com/BurntSushi/ripgrep . There are detailed performance comparisons and discussions in the readme there.
A file searcher akin to grep, ack, or ag (aka the silver searcher) it's programmed in Rust so it is decently fast with good support for UTF-8.

Unfortunately it defaults to parsing a git tree's gitignore file and skipping over files listed in it.

It also defaults to ignoring hidden and binary files. It's also simultaneously the thing folks cite as their favorite part about ripgrep.

The idea behind it is that it acts a heuristic for reducing false positives from your search results. For example, ripgrep replaced several little grep wrapper scripts I had in ~/bin.

And fortunately the default behavior is easy to disable. `rg -uuu foo` will search the same stuff as `grep -r foo ./`, but will do it faster.

I love that ripgrep honors .gitignore, but the fact that it skips hidden files is annoying because of questionable decisions from tool makers who insist their configuration files should be hidden files. It is especially infuriating when working with GitHub and GitLab configuration directories. On the other hand I never want ripgrep to enter the .git directory.

I recently came up with this alias to make ripgrep do what I want: do not skip hidden files, except for the .git directory:

alias rg="rg --hidden --glob '!.git/'"

(Note: if you try entering this alias interactively, you may have to escape the '!'...)

Yeah that alias is a good one, here are some other avenues:

* The repo can add a `.ignore` or a `.rgignore` whitelisting things like `.github`. ripgrep will pick up that whitelist automatically and search `.github` even though it's hidden. But this relies on the repo adding ripgrep-specific config files, which is maybe not so realistic. (Or not universal enough to rely upon.) But it could work fine for repos under your control.

* Add '!.github/' to, e.g., `~/.config/ripgrep/ignore`, and then add `alias rg="--ignore-file ~/.config/ripgrep/ignore"`. That will become a global ignore file (with low precedent) that will whitelist `.github` everywhere.

I use ripgrep everywhere, not only in my own repositories, so the first approach won't work for me. The second one, on the other hand, sounds like a really good idea, going to look into it, thanks.

And of course, thanks for this wonderful tool!

If I'm understanding this correctly, ripgrep parses multiple possible configuration files and is still one of the fastest tools I've ever used? That is amazing.
TL;DR - yes. :-)

There is a ripgrep "config file" (not mentioned in my previous comment), but there is only one and you have to set it via RIPGREP_CONFIG_PATH.

The things mentioned above are "ignore files," which are a sort of configuration for whitelisting and blacklisting files to search in a directory tree. And yes, you can splat them down into any directory, and if ripgrep enters that directory, it will read it and respect it. (Unless you tell it not to.)

If there are a lot of files with a lot of patterns, indeed, that can wind up taking a chunk of time not only building the matchers for each config file, but for actually matching them against every path. Sometimes it takes longer than not ignoring files at all! But if your ignore files are permitting ripgrep to skip GBs of data that GNU grep wouldn't otherwise skip, well, that's going to be a huge win no matter how you slice it.

ripgrep does use multi-threading as well, and it makes sure every ignore file is parsed and built only once. All other threads can then share the one single matcher.

> Unfortunately it defaults to parsing a git tree's gitignore file and skipping over files listed in it.

That's a feature.

Like, it's the entire point of ripgrep. It's designed to search through the things a developer actually cares about searching through.

If you actually want to search everything, just use grep.

"decently fast" is a significant understatement. It is likely the fastest similar tool. (`git grep` may win due to not listing the directory tree and packed files and GNU grep is very fast if you don't use Unicode, but other than that ripgrep wins).
just use --hidden, people shouldn't be afraid of typing an additional word. I prefer the defaults to keep things clean.
> it defaults to parsing a git tree's gitignore file and skipping over files listed in it

Is that true? How could anybody think that this non-orthogonal monstrosity would make any sense?

That monstrosity is easily one of the two most popular reasons cited by folks as being the reason why they use/like ripgrep. (With the other reason being performance.)

Orthogality is a means to an end, not an end itself.

because it suits the pragmatic 90% case and not some "what if" scenario. It has quite adequately outlined what files it does search and also how to overcome the defaults. If you use CLI you can use --help as well and get the other options.
It's the correct default for me. I don't want to grep heinous output files by default, like my gigabytes of generated KML files. I appreciate that rg (and ag) ignores those files by default, and 90% of the people I show it to agree.
Because you hardly ever want to grep through your build artifacts or node modules? I don’t remember any one time I had to explicitly tell ripgrep to search through all the files in a repository
Ah, someone who hasn't experienced the... experience of working on a project that does very esoteric autotools magic.

I will agree that in a sane project setup you don't need to search through all files including build artifacts ignored by git.

Even in an insane setup you generally don't want to search both at once. I find the pattern of first grepping non-ignored files and then cd-ing into my build directory or whatever and rerunning more helpful anyway.
It’s what you usually want when searching code bases. It includes all your source code but excludes generated code and build artefacts.