|
|
|
|
|
by axman6
4468 days ago
|
|
How do you get ILP from this? Seems like it would make scheduling much more difficult because you now need to make sure that that the results you need for each instruction are in the order in the belt you need, and you have to be able to execute any order of instruction types. The mill can run fast partly because and instruction can use any result on the belt, and similar instructions are grouped together making decode simpler (from memory, they have two separate decoders and each instruction has all say arith instructions grouped and decoded by one decode, and all the others decoded by the other). Basically, I don't see how you can use what you suggest to do, in one instruction: [ add positions 7 and 5 | multiply positions 7 and 2 | call f on 4 and 5 | branch to foo if position 3 was LT else bar ]
ending up with the belt [ [7]+[5], [7]*[2], f([4],[5]) ... ]
or whatever you like. All you need to do to schedule the mill is to perform as many operations in parallel as the hardware can do, and then find out where their results would be placed to create the next instruction. |
|
As for scheduling my proposed store-addressed belt: you perform as many operations in parallel, then for each operation find the other operation that depends on the former's operation result, calculate the distance between them and assign it as the former's store address. The compiler has more work to do yes, but not "much more difficult".