I tried to understand what is going on in this paper. Am I right to say that it seems like they are looking at parallelizing subtasks within each domain and at each time taking advantage of unordered tasks if possible?
Correct. One of the goals of this work was to expose and exploit the abundant parallelism within large atomic tasks. However, the tasks within a domain can be ordered or unordered. Our system will exploit any kind of parallelism that is available.