Hacker News new | ask | show | jobs
by JackCh 2934 days ago
Is this speed competitive with tools like the silver searcher (`ag`) or is the focus here on color?
5 comments

As much as I love `ag`, I feel like ripgrep (https://github.com/BurntSushi/ripgrep) deserves mentioning when it comes to speed. If you haven't tried it, do it sooner rather than later.

Here's an excellent write-up on how it works, benchmarks, etc.: https://blog.burntsushi.net/ripgrep/

ripgrep is soooooo good. I have switched to it and will never look back.
A quick look at the source shows that it appears to be linear and just uses `strings.Contains` or `r.MatchString` on each line, so I don't think it has any of the optimizations that are built into `ag`.
That is correct. The project is at its early stages. I want to see what the community need the most and shape the project towards that goal. On the other hand I tried to avoid optimisations until most of functionalities are implemented.
It's a very nice idea and you should be proud of what you've built, but my personal opinion is that speed is a core feature of `grep`.

A good place to start would be this: why GNU grep is fast[1] - Starting with the Boyer-Moore string search algorithm and reading through the optimizations done in GNU grep.

p.s. there's an implementation of Boyer-Moore hiding in Go's standard library.

[1] https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...

Thanks mate, I will definitely have a read.
Note that you don't need Boyer Moore for the common case. ripgrep for example will very rarely use Boyer Moore. Its work horse is much simpler and typically faster: https://github.com/rust-lang/regex/blob/master/src/literal/m...

In Go-land, you should be able to replace uses of memchr with IndexByte[1], which should be implemented in Assembly on most platforms.

Of course, for any of this to have a big impact, you'll want to take Mike Haertel's advice on avoiding line breaking and stop using bufio.Scanner. :-)

[1] - https://golang.org/pkg/bytes/#IndexByte

So far I've been only concerned about code's simplicity until I understand what there needs to be done. This is not going to be grep or ripgrep. My intent was to make a tool I needed so I started working on it. I thought someone else might like it, now it is joyful to see people are looking at the project.

There are a couple of places I wish I would have done better. Using bufio.Scanner actually bothers me a lot. Also in the Read() method it reads everything from all readers into a buffer instead of pulling what it needs to check.

Thanks for suggestions :)

I'm okay with it being not as fast because speed is not the goal here, but rather highlighting specific patterns to make it easier to spot for the human eyes, especially when tailing log lines from your development webserver.
> "to make it easier to spot for the human eyes,"

I suppose in that sense it does aim to be fast. Fast for the human to parse.

I've got the linux-4.17 kernel tree around, 61,322 files.

My desktop is running ubuntu-18.04, is an i5-3570, and has a fairly quick intel SSD.

Running "blush -R -i FunctionName ." takes 15.090 seconds and finds two files.

Running "ag -i FunctionName", finds one file, missing one in .clang-format.

Running "ag -i -u FunctionName", finds two files and makes 0.64 seconds.

So somewhere around 20-25x faster.

Thank you for doing the comparison. Would you do the same against the latest version (v0.5.0) please? Thank you.
I’m doubtful ag and rg use a lot of smart optimizations to get their speed.
Ummm...your kidding right?

I know ripgrep has a ton of fantastic optimizations by Burntsushi.

You might wanna check it out...before making such statements.

I could be wrong, but I read valarauca1's comment as "I’m doubtful. ag and rg use a lot of smart optimizations to get their speed."
lol people's reading comprensión is so bad sometimes
In fairness, it's the GPs fault on this occasion for not punctuating his or her post. Decarep just read the GPs post as it was literally written (I had to read it 3 times myself to gauge what I thought the post meant).
contextualization is a component of reading comprehension
Reading comprehension is difficult when there's missing punctuation. For example, "Let's eat grandma" versus "Let's eat, grandma".
contextualization is a component of reading comprehension