Hacker News new | ask | show | jobs
by cutler 2093 days ago
Much as I would love to believe Ruby 3.0 delivers some kind of speed bump my simple test of doing what Ruby supposedly does best - parsing a log file with a regex - shows Ruby 16% slower than the Python equivalent.

  Ruby
  puts IO.foreach('logs1.txt').grep /\b\w{15}\b/

  Python
  from re import compile

  with open('logs1.txt', 'r') as fh:
      regex = compile(r'\b\w{15}\b')
      for line in fh:
          if regex.search(line): print(line, end='')
On my MacBook Pro (2013) running Catalina Ruby averaged 1.49 secs and Python 1.27 secs. The file `logs1.txt` is a 20Mb Apache log file. Pre-compilation with:

   reg = Regex.compile /\b\w{15}\b/
   puts IO.foreach('logs1.txt').grep reg
... slowed Ruby down to 1.57 secs.

Using --jit didn't change Ruby's overall time but considering it adds 700ms to Ruby's startup time execution time was faster.

3 comments

The programs aren't doing the same things. The Ruby one seems to buffer the matches into a huge string then print. Don't know how different it would be but better to compare apples.
Probably the equivalent thing to test would be

      regexp = /\b\w{15}\b/
      IO.foreach('logs1.txt') do |line|
        puts line if regexp.match?(line)
      end
I imagine this will in both cases mostly be testing the computer's IO.

Regexp.compile doesn't do anything substantially different from Regexp.new or a literal (i.e. it doesn't optimize the regular expression or something) so I think the difference is just random fluctuations.

IO.foreach without a block returns an enumerator.
You may be right. This version:

    IO.foreach('logs1.txt') {|x| puts x if /\b\w{15}\b/.match? x }
... runs in 1.3 secs, ie. same time as Python. Still, I exepcted Ruby 3x3 to beat standard Python.
That's not really surprising since regex in most languages is PCRE and not really indicative of actual interpreter performance.
If that's the case why is Ruby slower?
Your Ruby snippet is slower for two reasons:

  - IO.foreach is lazy, but then you call `Array#grep` on it which allocate a huge Array
  - `grep` populate the "last_match" object, `String#match?` is much faster.
Try rifraf's snippet: https://news.ycombinator.com/item?id=24597067, it should be as fast as Python.
Because what you’re really measuring is something like how well the block size the interpreter uses for IO matches the underlying os/hardware
If they are both doing similar things and one is faster that's still significant. So Python's i/o is faster than Ruby's.
But it’s a super easy to read one-liner in Ruby!