Hacker News new | ask | show | jobs
by fredrikbk 3154 days ago
Hi Hacker News! I’m one of the developers. This project was also featured in MIT News yesterday:

http://news.mit.edu/2017/faster-big-data-analysis-tensor-alg...

The code is available at:

https://github.com/tensor-compiler/taco/issues

I’m happy to discuss the project and to answer any questions :)

6 comments

The MIT story focuses on "big data," but it looks like this might be applicable to coupled-cluster calculations in physics/chemistry too. Is it? Can you compare/contrast with e.g. the Cyclops Tensor Framework (http://solomon2.web.engr.illinois.edu/ctf/) or NWChem's Tensor Contraction Engine (http://www.csc.lsu.edu/~gb/TCE/)?
Larry chose to focus on the big data part, because it is intuitive. But I think you're absolutely correct, that it has applications in physics/chemistry (and machine learning too). We're actually talking to people in our theoretical physics department, who may want to use taco for their QCD computations. There's also a new issue on our tracker about adding complex number support for nuclear computations and quantum computing: https://github.com/tensor-compiler/taco/issues/116.

The tensor contraction engine is great work, and focuses on dense tensor. We currently optimize for sparse tensors so TCE will do better than us for pure dense expressions. We want to bridge this gap next semester though.

The cyclops framework is also great. We discuss it in our related work, but we did not directly compare to it in our evaluation. The first version of it, like TCE, focused on dense tensors, and their main focus is on distributed computing, which we don't support yet (we do shared memory parallel execution at the moment). They have some followup work on sparse computations. The difference to our work is that they, if I read their paper correctly, transposes the data until they can call pre-existing matrix multiplication routines. This causes data movement overheads. Our work compiles expressions to work at the data at hand, without moving it.

This looks like it might be nice, but regular coupled cluster doesn't really have very many sparse matrices so mapping things into Blas3 is fine. Also if you go the reduced scaling route and use PNOs or OSVs everything is mapped into matrix matrix for CCSD anyways.
Just saw your talk. Man you are one hell of a speaker. Great slides, animations and presentation. I enjoyed it.

It seems like a strange thing that no one thought of doing this before! Nice that you identified a cool problem and solved it :)

Not entirely true. Sparse x Dense Matrix is an important component of the DSSTNE deep learning framework:

https://github.com/amzn/amazon-dsstne

But this library is clearly more thorough.

For sure, as you point out, many such kernels have been written, going at least as far back as 1967.

We believe our contribution is being able to generate kernels for all the expressions.

Thanks for the reference though! We’re trying to learn where sparsity may be important in neural networks.

Thanks, that is really nice of you :)
Skimming it quickly I didn’t see anything about gpu or vector instruction support as compilation targets. Is this planned? Did I miss something and this is at a higher layer?

Ps: excellent name

We currently compile to C code and use the system Compiler to compile it further. For dense loop nests it does a good job of auto-vectorizing, but we believe there’s good opportunities for doing something custom, knowing the high-level algebraic structure.

taco does not target GPUs yet, and we want to work on it this spring. It’s clearly needed, for example for neural networks

Hi, I just saw the presentation too, and it looks impressive.

I'm hoping that you can (at some point) write a language-agnostic library, so that your work can be used in many other development tools (which are not using C++). I suppose it would amount to publishing the documentation of the intermediate code you are already using. This would also save you the trouble of writing and maintaining a GPU back-end because other people could do that using your library.

Anyway, great work!

That is a good idea, but we are only so many. One student in the Julia group is developing Julia bindings though, and we hope to make C/python bindings. The inner workings are published in the paper “The Tensor Algebra Compiler”, so it can be implemented in other languages. We like the idea of bindings because then our future work can benefit more users.
Any chance you'd be interested in adding more direct support for nonlinear systems? That seems to be missing from most computation graph compiler libraries, e.g. Tensorflow.
Hi, that sounds very interesting and I hope someone works on it. We're only so many people though, and we have several things we want to do that will at least occupy us until next summer.
I entered "x(i) = y(i) + z(i)" in the online demo, all three sparse, and I noticed some useless variable definitions: int32_t iz0 = z1_idx[pz1];

Is this intentional? (if so, why)

We’ve tried to engineer it to emit clean code, but seems we left in some unused variables.

The C compilers dead code elimination should remove those though.