| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hansvm 1191 days ago

1. The compiler will vectorize simple operations like that pretty well.

2. Zig has built-in @Vector types that are fixed-size data types designed to compile down to things like SIMD as efficiently as possible given that you might be asking it to do 16x operations on a CPU only supporting 8x width SIMD. You'd often write your high-level code as a runtime-known iteration count over those comptime-known vector widths.

2a. Inline assembly or inspecting the architecture before choosing the @Vector width are both options, so you can write your high-level code with that information in mine if necessary (e.g., to make bolt vector quantization work well in Zig I'm pretty sure you need to inline-assembly one of the swizzling operations).

3. You can always link to LAPACK and friends. Zig has a great c interface.

4. Matrix/tensor ops aren't built-in. That doesn't matter for a lot of what this demo shows since it'll be RAM/cache bandwidth bound, but you'd definitely need to link in or hand-code inner product routines to have better asymptotics and cache friendliness if you were doing too many large matrix multiplies.

5. Wrapping any of the above into a library would be pretty easy. That sort of code is easy to write, so I haven't looked to see what other people have made in that space, but I'm sure there's something.

2 comments

hansvm 1191 days ago

And their async/await is being redesigned IIRC, but at least the last version would make for a low overhead (in both developer time and runtime) way to parallelize any of the above. In the old version it was totally trivial to write a high performance parallel variant (knight's move and whatnot) sudoku solver, and I doubt the new version will be any worse. Not that the interior of a linear algebra operation is often the best place to apply parallelization, but if you had a good reason to do so it wouldn't be hard.

I'm not aware of anything in particular that would make multi-machine computations even slightly less painful than other languages, but maybe someone can chime in here with ideas.

link

waynecochran 1191 days ago

C++ / Eigen performs a lot of optimizations that you might expect from a Fortran compiler. For example, it ability to broadcast and perform reductions is pretty slick, e.g.:

        X.noalias() = (A - B).colwise().squaredNorm().mean();

Short of having tensor cores, this does a good job of static optimization for underlying vectorization support.

link

wyldfire 1191 days ago

> 2a. Inline assembly or inspecting the architecture before choosing the @Vector width are both options, so you can write your high-level code with that information in mine if necessary (e.g., to make bolt vector quantization work well in Zig I'm pretty sure you need to inline-assembly one of the swizzling operations).

Inline assembly is great but support for intrinsics would be really valuable for Zig IMO.

link

hansvm 1191 days ago

You can link in intrinsics too. Again, C compatibility is high.

link

wyldfire 1191 days ago

I could write a `pub fn main()` that consisted only of a call to a C library that implemented world_peace(). But it wouldn't change the fact that "support for intrinsics would be really valuable for Zig IMO."

link