| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mvkg 2978 days ago
	A quick look at the source shows that it appears to be linear and just uses `strings.Contains` or `r.MatchString` on each line, so I don't think it has any of the optimizations that are built into `ag`.

1 comments

arsham 2978 days ago

That is correct. The project is at its early stages. I want to see what the community need the most and shape the project towards that goal. On the other hand I tried to avoid optimisations until most of functionalities are implemented.

link

ozkatz 2977 days ago

It's a very nice idea and you should be proud of what you've built, but my personal opinion is that speed is a core feature of `grep`.

A good place to start would be this: why GNU grep is fast[1] - Starting with the Boyer-Moore string search algorithm and reading through the optimizations done in GNU grep.

p.s. there's an implementation of Boyer-Moore hiding in Go's standard library.

[1] https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...

link

arsham 2977 days ago

Thanks mate, I will definitely have a read.

link

burntsushi 2977 days ago

Note that you don't need Boyer Moore for the common case. ripgrep for example will very rarely use Boyer Moore. Its work horse is much simpler and typically faster: https://github.com/rust-lang/regex/blob/master/src/literal/m...

In Go-land, you should be able to replace uses of memchr with IndexByte[1], which should be implemented in Assembly on most platforms.

Of course, for any of this to have a big impact, you'll want to take Mike Haertel's advice on avoiding line breaking and stop using bufio.Scanner. :-)

[1] - https://golang.org/pkg/bytes/#IndexByte

link

arsham 2977 days ago

So far I've been only concerned about code's simplicity until I understand what there needs to be done. This is not going to be grep or ripgrep. My intent was to make a tool I needed so I started working on it. I thought someone else might like it, now it is joyful to see people are looking at the project.

There are a couple of places I wish I would have done better. Using bufio.Scanner actually bothers me a lot. Also in the Read() method it reads everything from all readers into a buffer instead of pulling what it needs to check.

Thanks for suggestions :)

link