|
|
|
|
|
by WithinReason
92 days ago
|
|
This is 11 bit ops and a subtract, which I assume is ~11 clocks, while you can just do: l1 = dot(A[:11000000],B[:11000000])
l2 = dot(A[:00110000],B[:00110000])
l3 = dot(A[:00001100],B[:00001100])
l4 = dot(A[:00000011],B[:00000011]) result = l1 + l2 * 4 + l3 * 16 + l4 * 64 which is 8 bit ops and 4x8 bit dots, which is likely 8 clocks with less serial dependence |
|