| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by JackCh 2934 days ago
	Is this speed competitive with tools like the silver searcher (`ag`) or is the focus here on color?

5 comments

lillesvin 2934 days ago

As much as I love `ag`, I feel like ripgrep (https://github.com/BurntSushi/ripgrep) deserves mentioning when it comes to speed. If you haven't tried it, do it sooner rather than later.

Here's an excellent write-up on how it works, benchmarks, etc.: https://blog.burntsushi.net/ripgrep/

link

VeejayRampay 2934 days ago

ripgrep is soooooo good. I have switched to it and will never look back.

link

mvkg 2934 days ago

A quick look at the source shows that it appears to be linear and just uses `strings.Contains` or `r.MatchString` on each line, so I don't think it has any of the optimizations that are built into `ag`.

link

arsham 2934 days ago

That is correct. The project is at its early stages. I want to see what the community need the most and shape the project towards that goal. On the other hand I tried to avoid optimisations until most of functionalities are implemented.

link

ozkatz 2933 days ago

It's a very nice idea and you should be proud of what you've built, but my personal opinion is that speed is a core feature of `grep`.

A good place to start would be this: why GNU grep is fast[1] - Starting with the Boyer-Moore string search algorithm and reading through the optimizations done in GNU grep.

p.s. there's an implementation of Boyer-Moore hiding in Go's standard library.

[1] https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...

link

arsham 2933 days ago

Thanks mate, I will definitely have a read.

link

burntsushi 2933 days ago

Note that you don't need Boyer Moore for the common case. ripgrep for example will very rarely use Boyer Moore. Its work horse is much simpler and typically faster: https://github.com/rust-lang/regex/blob/master/src/literal/m...

In Go-land, you should be able to replace uses of memchr with IndexByte[1], which should be implemented in Assembly on most platforms.

Of course, for any of this to have a big impact, you'll want to take Mike Haertel's advice on avoiding line breaking and stop using bufio.Scanner. :-)

[1] - https://golang.org/pkg/bytes/#IndexByte

link

arsham 2933 days ago

So far I've been only concerned about code's simplicity until I understand what there needs to be done. This is not going to be grep or ripgrep. My intent was to make a tool I needed so I started working on it. I thought someone else might like it, now it is joyful to see people are looking at the project.

There are a couple of places I wish I would have done better. Using bufio.Scanner actually bothers me a lot. Also in the Read() method it reads everything from all readers into a buffer instead of pulling what it needs to check.

Thanks for suggestions :)

link

nazri1 2934 days ago

I'm okay with it being not as fast because speed is not the goal here, but rather highlighting specific patterns to make it easier to spot for the human eyes, especially when tailing log lines from your development webserver.

link

JackCh 2934 days ago

> "to make it easier to spot for the human eyes,"

I suppose in that sense it does aim to be fast. Fast for the human to parse.

link

sliken 2932 days ago

I've got the linux-4.17 kernel tree around, 61,322 files.

My desktop is running ubuntu-18.04, is an i5-3570, and has a fairly quick intel SSD.

Running "blush -R -i FunctionName ." takes 15.090 seconds and finds two files.

Running "ag -i FunctionName", finds one file, missing one in .clang-format.

Running "ag -i -u FunctionName", finds two files and makes 0.64 seconds.

So somewhere around 20-25x faster.

link

arsham 2926 days ago

Thank you for doing the comparison. Would you do the same against the latest version (v0.5.0) please? Thank you.

link

valarauca1 2934 days ago

I’m doubtful ag and rg use a lot of smart optimizations to get their speed.

link

deckarep 2934 days ago

Ummm...your kidding right?

I know ripgrep has a ton of fantastic optimizations by Burntsushi.

You might wanna check it out...before making such statements.

link

dagenix 2934 days ago

I could be wrong, but I read valarauca1's comment as "I’m doubtful. ag and rg use a lot of smart optimizations to get their speed."

link

mlevental 2934 days ago

lol people's reading comprensión is so bad sometimes

link

laumars 2934 days ago

In fairness, it's the GPs fault on this occasion for not punctuating his or her post. Decarep just read the GPs post as it was literally written (I had to read it 3 times myself to gauge what I thought the post meant).

link

mlevental 2933 days ago

contextualization is a component of reading comprehension

link

amhokies 2933 days ago

Reading comprehension is difficult when there's missing punctuation. For example, "Let's eat grandma" versus "Let's eat, grandma".

link

mlevental 2933 days ago

contextualization is a component of reading comprehension

link