Hacker News new | ask | show | jobs
by emj 3137 days ago
About as fast as numpy.. More tools to create fast code is always great, but the tooling for Rust/C in Python needs to be easier, I just can't be bothered most of the time.

This in numpy gets a better relative boost on my machine YMMV.

    import numpy
    def count_double_chars_np(val):
	ng=np.fromstring(val,dtype=np.byte)
	return np.sum(ng[:-1]==ng[1:])

    def test_np(benchmark):
        benchmark(count_double_chars_np, val)
2 comments

Good numpy implementation of the algorithm. If, for whatever reason, numpy isn't available, you can also pull it with a good comprehension:

    def count_doubles2(val):
        return sum(1 for c1, c2 in zip(val, val[1:]) if c1 == c2)
Which will also allow you to avoid a function call entirely, if it was useful in some way:

    In [56]: %timeit count_doubles(val)
    198 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    In [57]: %timeit count_doubles2(val)
    189 ms ± 21.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    In [58]: %timeit sum(1 for c1, c2 in zip(val, val[1:]) if c1 == c2)
    135 ms ± 3.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

    In [59]: %timeit count_double_chars_np(val)
    6.95 ms ± 782 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(Numpy still beats it, for long strings).
Hi, can you send a Pull Request including your numpy implementation? https://github.com/rochacbruno/rust-python-example I would like to add it there just for the record and then I will update the article.
Thank you for the very nice, educative article, Bruno!

If performance comparison of counting character pairs really were the issue here, in addition to the already suggested numpy approach, an implementation I'd dare wager to be as competitive is re2, e.g. [1], a drop-in replacement for the standard re package.

But I want to point out that I think all this performance comparison of this trivial character counting distracts from the core idea here: You'd use a low-level implementation in Rust (or C/C++/Cython, for that matter) when such "nifty tricks" are not available, after all. So again thanks for the article, and do think if you really want this performance issues to degrade the article to a only marginally relevant performance "showdown".

https://pypi.python.org/pypi/re2/

Thanks would be nice, you should be able to just copy and paste that oneliner, but I'm not sure you blog post is better for it. The idea that Rust is easier to include in Python is important enough, and Numpy is a bit of an edge case imho.

Ideas are always CC-zero