Hacker News new | ask | show | jobs
by mrcrassic 3858 days ago
Linus's comments make a lot of sense but the fact of the matter is this: a number is a number is a number. If I run the same "flawed" benchmark suite on last year's iPhone 5, it will tell me that, yes, its CPU is indeed slower than that of a 6S. The tests might not holistically represent real-world usage but they output just enough data to tell me quantitatively how much better one phone performs over another in a vacuum.

The "real" benchmark; actually running automation on a cross section of phones against a few popular apps, would probably be a more useful test. But Anandtrch and their cohorts don't seem to agree.

2 comments

i wasn't really referring to linus's comments although he is pretty much spot on, i hate to say (i have no great love for him XD).

you are right that numbers are numbers, but that says nothing about what they mean or measure. i can write a program that has benchmark in its name and just pulls any old number out, like how many millseconds since the last hour ticked over and give that back as a number. you will get meaningless results.

the argument the author of the geek bench software makes sounds kind of valid but there is something i learned many years ago about optimisation and performance. measurement trumps everything. given that the futuremark physics demo uses bullet physics, complete with its own tight inner loops is a great example of a measurement showing that this benchmark does not correlate with real world uses. the large number of other benchmarks that agree are even more data to support that argument.

that being said his argument is technically baseless afaik too. no app, not even a simple renderer of a blank screen will spend all of its time inside of a short tight innermost loop - it will spend its time bouncing between multiple of them, doing a lot of waiting and having its time stolen at any given point by the OS. i'd be curious to know how he thinks these behaviours will go away in the future? the i-cache setup and L1 cache sizes are not enough for even some quite simple loops even if they are 8-way associative and bristling with all the latest features - its often quite trivial to construct pathologically bad cases using common programming practices - e.g. writing your code using objective-c or swift there is a lot of scope for cache missing and pollution. i can go into excruciating detail about this if you are interested and provide repeatable experiments to back my claims, i've done plenty of work on these things writing and optimising code... but this is long already. :)

Sure, measuring one iPhone against the next makes some sense with synthetic browser benchmarks, but I think where the hype breaks down is in using that browser benchmark to say that the iPad Pro can squash any other portable device out there. At the very least these benchmark results indicate that it's (surprise!) more of a mixed bag than has been reported.

Regarding the need for a "real" benchmark, from the article:

"While Geek Bench 3 attempts to create what its makers think is an accurate measure of CPU performance using seconds-long “real world” algorithms, BAPCo’s approach is actually more “real world.” BAPCo’s consortium of mostly hardware makers set out to create workloads across all the different platforms that would simulate what a person does, such as actually editing a photo with HDR, browsing the web, or sending email."

The author goes on to concede that they have custom apps for each platform to accomplish this task, but it seems that the TabletMark developers are aware of the exact issue you raise.