| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ianhorn 1831 days ago

Regarding your performance work, you've nerd sniped me into looking for analytical tricks to speed it up ;) We'll see...

Regarding the readability, I suppose it's a matter of taste at this point, but if I swapped `import numpy as np` for `from numpy import newaxis, exp, sqrt`, then IMO, the numpy is more readable:

    A = a[newaxis]
    M = exp(1j * k * sqrt(A**2 + A.T**2))

But then my tastes also run towards preferring the handwritten definitions to be in terms of vectors and transposes instead of elements and indices, at least until you get to many indexed tensors.

Side note: any idea why the python and fortran are exp(i k... while the julia is exp((100 +i) i...)? Is it something I overlooked?

1 comments

eigenspace 1831 days ago

> Regarding your performance work, you've nerd sniped me into looking for analytical tricks to speed it up ;) We'll see...

Looking forward to it! These microbenchmarks are very fun to explore.

> Side note: any idea why the python and fortran are exp(i k... while the julia is exp((100 +i) i...)? Is it something I overlooked?

Oh, that is a partially applied edit I guess. When the author posted on the julia forum, people pointed out that since the exponent was pure imaginary, it could be speeded up even more with cis(...) instead of exp(im * ...) but then the author claimed that exp((100 + im) * ...) was more representative of his actual workflow and I guess changed the julia version in his blogpost but not the Python or Fortran versions.

link

ianhorn 1831 days ago

I got bored with trying to find an analytical boost, but I benchmarked a couple IMO super readable python versions (basically what's in my original comment after making the (100+i)i change):

https://colab.research.google.com/drive/1ABrZJlm8pwB6_Sd6ayO...

On my macbook, using XLA's jit in python gave about a 12-15x speedup on CPU over OP's solution, which was pretty cool, but I'm too lazy to figure out how to install and benchmark Julia on my machine. Applying a 12-15x speedup would at least beat the Julia MT solution in OP, and you've got to admit `exp(CONST * sqrt(A**2 + A.T**2))` is a pretty clean way to do it.

Then I ran on whatever GPU colab decided to give me (a P100), and for just adding a decorator, it's a 1000x-1900x speedup (better as n goes up). Hence my current honeymoon period with jax. I love the speed vs readability tradeoff.

link

celrod 1831 days ago

> I'm too lazy to figure out how to install and benchmark Julia on my machine.

Assuming you're eager to try once you've found out:

You should be able to simply download and unpack a binary from: https://julialang.org/downloads/ I'd strongly recommend going with the current stable release (1.6.1).

To start the Julia REPL:

  bin/julia

To install packages:

  using Pkg
  Pkg.add("BenchmarkTools")
  Pkg.add("LoopVectorization")

then you can copy/paste code from eigenspaces link to discourse to run Julia benchmarks. E.g., you can copy/paste from https://discourse.julialang.org/t/i-just-decided-to-migrate-... if you define

  using LoopVectorization
  const var"@tvectorize" = var"@tturbo"

(because `@tvectorize` has since been renamed on the latest releases)

link

celrod 1831 days ago

Also, note that Julia starts with only a single thread by default. You'd either need to add `-t4` when starting julia, e.g. `julia -t4`, or set the environmental variable `JULIA_NUM_THREADS=4` for 4 threads, for example.

link

zwaps 1831 days ago

Looks good, but I actually love that Julia gets at least the same performance right out of the box, with full interoperability and the great multiple dispatch logic.

Would be interesting to run your optimized code against the Julia optimized code that was linked to in this thread. Or run a Julia GPU benchmark as well.

As for loops versus vectorization - I personally am used to vectorized code as it has always been a requirement. Sometimes it makes sense to use (e.g. Matrix Algebra stuff).

On the other hand, I have also been in many situations where I could not vectorize my code. More code than I'm happy to admit includes list or dict comprehensions or even loops. Sure, not all performance critical. Nevertheless, the prospect of performance no matter what is pretty exciting.

Finally, threading and multiprocessing in python is a headache compared to Julia imo. Sometimes I just miss the good ol' "parfor" from Matlab.

All in all, Julia is a really appealing value proposition to people doing numeric computing.

link

enriquto 1830 days ago

> the great multiple dispatch logic

Curiously, as much as I love the Julia language, I cannot stand multiple dispatch. I find it extremely confusing, ugly, and downright disturbing. Why would you ever want two different functions with the same name? And if this is good, why stop here? Why not two different variables with the same name but different types? Depending on the context where they are used, the language could pick one or the other.

link

DNF2 1830 days ago

But this is what most languages do, except they have _singular_ dispatch. Every OOP language has different behaviour for different input types, except it only works for the _first_ input argument. Julia simply extends this to every argument.

How would you feel about having to write `sqrt_int32(x)`, `sqrt_int64(x)`, `sqrt_float64(x)`, `sqrt_rational_complex_float16(x)`, etc, etc, etc, etc, instead of just `sqrt`, and let dispatch or overloading take care of the different implementations?

link

enriquto 1830 days ago

> Every OOP language has different behaviour for different input types

Yeah. And I find OOP horrific.

> How would you feel about having to write `sqrt_int32(x)`, `sqrt_int64(x)`, `sqrt_float64(x)`,

What kind of savage wants the square root of an integer? I'd prefer if the language only had a single number type (maybe configurable at once by an external option) and a single sqrt function.

But if you talk about the general problem of naming functions differently according to their types, I feel that it's perfectly OK. A tiny price that I'm eager to pay for the large benefit of being able to identify which function is called just by looking at its name (and not at the---possibly yet undetermined---types of its arguments).

link