Hacker News new | ask | show | jobs
by deepnotderp 3275 days ago
Awesome! I have a couple of questions if you don't mind:

1) Does this require the programmer to rewrite software or make annotations? Even if that's the optimal way, can a compiler do this automatically, at least to some extent?

2) how general of a workload can this accelerate? You mention speculative multithreading and HTM in both papers, both of which are fairly general purpose techniques.

1 comments

1. Currently the programmer needs to delineate code into tasks. A couple of points to note: -- Often the programmer can start with the serial code, and determine how to delineate tasks. In our experience this is often easier than writing more complex parallelized versions of the same algorithm. Compared to the serial code, our programs are just a few percent longer. -- We have implemented high-level interfaces along the lines of OpenMP or OpenTM to make it easier for the programmer to break straight-line serial code into fine-grain tasks. For example, if you have a for loop to parallelize, you can use the fractal::forall(...) construct. Table 1 of the Fractal paper lists some of these high-level interface functions.

We are currently working on a compiler that can take serial code and automatically annotate tasks and domains. Keep watching our webpage for updates on that.

2. Our execution model subsumes both Transactional Memory (TM) and Thread-Level Speculation (TLS). Any program that can be expressed in those paradigms can be expressed in Fractal. And as we show in the paper, Fractal can express parallelism that was not possible in TLS or TM. So we believe it is general purpose. Further. as indicated above, we have seen these ideas to be applicable to applications in a wide range of domains.

That being said, we are always on the lookout for applications that reveal deficiencies in our execution model, to help improve it :)

1. Yes ,I've seen that, and I think that an OpenMP-esque annotation system specified by the programmer would be optimal, but I don't see any reason that a "sufficiently smart compiler" [ ;) ] can't produce an approximation. Given the SpMT-like "thread" (task) squashing, it shouldn't cause any semantic issues.

2. Absolutely, SpMT has been proposed as a general purpose technique, that's why I found it odd that the benchmarks weren't on a general purpose benchmark like SpecINT, do you have any benchmarks on SpecINT or any plan to do so?

2: We are currently working on a compiler to automatically delineate tasks, assign domains etc. given sequential code. As part of that effort, we are looking at SPEC workloads, yes.