Hacker News new | ask | show | jobs
by mvkg 2928 days ago
A quick look at the source shows that it appears to be linear and just uses `strings.Contains` or `r.MatchString` on each line, so I don't think it has any of the optimizations that are built into `ag`.
1 comments

That is correct. The project is at its early stages. I want to see what the community need the most and shape the project towards that goal. On the other hand I tried to avoid optimisations until most of functionalities are implemented.
It's a very nice idea and you should be proud of what you've built, but my personal opinion is that speed is a core feature of `grep`.

A good place to start would be this: why GNU grep is fast[1] - Starting with the Boyer-Moore string search algorithm and reading through the optimizations done in GNU grep.

p.s. there's an implementation of Boyer-Moore hiding in Go's standard library.

[1] https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...

Thanks mate, I will definitely have a read.
Note that you don't need Boyer Moore for the common case. ripgrep for example will very rarely use Boyer Moore. Its work horse is much simpler and typically faster: https://github.com/rust-lang/regex/blob/master/src/literal/m...

In Go-land, you should be able to replace uses of memchr with IndexByte[1], which should be implemented in Assembly on most platforms.

Of course, for any of this to have a big impact, you'll want to take Mike Haertel's advice on avoiding line breaking and stop using bufio.Scanner. :-)

[1] - https://golang.org/pkg/bytes/#IndexByte

So far I've been only concerned about code's simplicity until I understand what there needs to be done. This is not going to be grep or ripgrep. My intent was to make a tool I needed so I started working on it. I thought someone else might like it, now it is joyful to see people are looking at the project.

There are a couple of places I wish I would have done better. Using bufio.Scanner actually bothers me a lot. Also in the Read() method it reads everything from all readers into a buffer instead of pulling what it needs to check.

Thanks for suggestions :)