| > He didn't go into why vector instructions are hard to use efficiently, and hard for compiler writers, but I'd be interested if anyone here could explain that. I have a simplistic explanation - maybe not what you're looking for but it is the best I can do... At 12m23s in the video he says, "If you're working in a layer and the layers are well constructed (abstracted) you really can make a lot of progress. But if the top layer says, 'to make this really fast, go change the bottom layer', then its going to get all tangled up." That's what implementing an algorithm on a SIMD architecture feels like to me. I have to figure out a way of filling my SIMD width with data each clock cycle, while in contrast, the specification of the algorithm deals with data one piece at a time. Take insertion sort as a (bad) example. i ? 1
while i < length(A)
j ? i
while j > 0 and A[j-1] > A[j]
swap A[j] and A[j-1]
j ? j - 1
end while
i ? i + 1
end while
That algorithm cannot easily take advantage of SIMD. You have to change the algorithm to make it work with the architecture.We'd probably say the algorithm is the top level of the abstraction stack, and the SIMD architecture is a level near the bottom. So this problem is the opposite way around to how Jim phrased it, but the point is that we have NOT got clean abstraction - an implementation in one layer depends on the implementation in another. |