Hacker News new | ask | show | jobs
by bmalehorn 2181 days ago
Author here.

> 1. If emulating aarch64 (arm64) on x86_64 is 6x slower (on your system, btw, it's not an universal constant), it doesn't mean emulating x86_64 on aarch64 will be 6x slower. It'd probably be worse, or at least that's my gut feeling.

Yup, performance benchmarks are inherently flawed and nobody knows anything right now without the hardware. However if ARM -> x86 emulation is anything like x86 -> ARM emulation, I would expect a really big performance loss.

> 2. Generic container images like the Ubuntu mentioned usually have aarch64 (arm64) support, so running the x86_64 image makes no sense for the presented use-case.

Ah actually I address this in the article, and even run an arm64 image. The short version is, it would be a lot of work to convert your whole backend infrastructure to ARM just because you got a new laptop.

> 3. You won't be able to use most software because they don't release ARM binaries ... and the example uses `wget` && `tar xf`, with no binary signature check. As someone who has been porting stuff from x86_64 to aarch64 for a couple of years, I admit I've seen this pattern frequently. The most obvious solution is to build from sources, which would have been better off on x86_64 too, instead of fetching a prebuilt (and unverified) binary from the internet. Maybe there are some CPU flags the compiler could notice and apply optimizations which are not included in the prebuilt binary.

Yes, if only everything were built from source! I'm not saying there's no solution, just that the solution would be a lot of work. If the library is obscure enough and the errors are strange enough, it might be so much work as to be impossible to the busy web developer.

My goal was to write a kind of hand-wavy article to get people talking about this problem.

1 comments

I agree on the performance loss. Just for kicks, I ran the same commands on some real aarch64 (32 cores, 3.0GHz, ARMv8.? - can't remember and already logged off the machine, but I can double check tomorrow). Without further context, numbers:

  someuser@some-aarch64-machine:~$ docker run arm64v8/ubuntu bash -c 'dd if=/dev/urandom bs=4k count=10k | gzip > /dev/null'
  10240+0 records in
  10240+0 records out
  41943040 bytes (42 MB, 40 MiB) copied, 2.18298 s, 19.2 MB/s
  someuser@some-aarch64-machine:~$ docker run amd64/ubuntu bash -c 'dd if=/dev/urandom bs=4k count=10k | gzip > /dev/null'
  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  10240+0 records in
  10240+0 records out
  41943040 bytes (42 MB, 40 MiB) copied, 6.72324 s, 6.2 MB/s
Awesome, thanks for testing this out!

A 3x slowdown is not as bad as 6x, but it's still quite a bit. I also saw a slowdown of ~4x when I tried this experiment on a native Linux x86_64 running ARM - perhaps the Mac -> Linux virtualization slowed it down further.

5x may have been a bit alarmist, but regardless we should brace ourselves for a big performance hit on x86_64 virtualization.

I'm surprised it's only a 3x slowdown. But the single-thread performance of native execution (without emulation) is worse on aarch64, which was expected. Imo, a better benchmark would take into account the multithread performance with/without emulation.