|
My apologies. I misread the code and what it did, because of the use of the word 'sorted' and my belief that branch prediction should not affect this code. I still don't believe it's due to branch prediction. I believe it's due to cache coherency. Here's a more extensive example, where I shuffle the last N elements: import time
from random import shuffle
x = [i for i in range(1000000)]
tick=time.clock()
sum1=0
for i in x:
sum1=sum1+i
toc=time.clock()
print "ref", toc-tick, sum1
for N in (0, 250000, 500000, 750000, 1000000):
y1 = range(0, N)
y2 = range(N, 1000000)
shuffle(y2)
y = y1 + y2
tick=time.clock()
sum1=0
for i in y:
sum1=sum1+i
toc=time.clock()
print "shuffle last %d: %f %d" % (len(x)-N, toc-tick, sum1)
assert x == y, "These should be identical when N==1000000"
print "== Two summations of the same array =="
tick=time.clock()
sum1=0
for i in x:
sum1=sum1+i
toc=time.clock()
print "ref (again)", toc-tick, sum1
tick=time.clock()
sum1=0
for i in y:
sum1=sum1+i
toc=time.clock()
print "final range (again)", toc-tick, sum1
When I run the above, the output is: ref 0.158759 499999500000
shuffle last 1000000: 0.254425 499999500000
shuffle last 750000: 0.228932 499999500000
shuffle last 500000: 0.252207 499999500000
shuffle last 250000: 0.224307 499999500000
shuffle last 0: 0.255656 499999500000
== Two summations of the same array ==
ref (again) 0.158914 499999500000
final range (again) 0.250253 499999500000
As you can see, the last two tests use arrays x and y where x == y == range(1000000), but their timing numbers are quite different. Branch prediction shouldn't cause that difference. Poor cache behavior might.Here's an attempt to double-check my view. I use an array.array(), which stores the numbers sequentially both for the sorted and random versions: import array
import time
from random import shuffle
x = [i for i in range(1000000)]
x = array.array("i", x)
tick=time.clock()
sum1=0
for i in x:
sum1=sum1+i
toc=time.clock()
print "sorted", toc-tick, sum1
x = [i for i in range(1000000)]
shuffle(x)
x = array.array("i", x)
tick=time.clock()
sum1=0
for i in x:
sum1=sum1+i
toc=time.clock()
print "unsorted", toc-tick, sum1
When I run the above, I find that both timings are essentially identical: sorted 0.159446 499999500000
unsorted 0.161541 499999500000
This gives very strong evidence (unless I made another mistake) that branch prediction does not play a role, while cache prediction should be the actual suspect. |