No benchmarks. No FLOPs. No comparison to commodity hardware. I hate the cloud servers. "9 is faster than 8 which is faster than 7 which is faster than 6, ..., which is faster than 1, which has unknown performance".
How: You've ran the test on a bunch of hosts and create a spec from ranges.
Why: you might be concerned with network connectivity (you don't get to choose which data center you launch in and it might not be exactly equal), noisy neighbors on shared hosts, etc. if you're measuring for networking, you probably are spinning ups separate accounts/using a bank of accounts and something in every az until you find what you're looking for.
I’ve had terrible luck benchmarking EC2. Measurements are too noisy to be repeatable. The same instance of the wrong type can swing by double digit percentages when tested twice an hour apart.