Hacker News new | ask | show | jobs
by smspf 2181 days ago
So many wrong assumptions ...

1. If emulating aarch64 (arm64) on x86_64 is 6x slower (on your system, btw, it's not an universal constant), it doesn't mean emulating x86_64 on aarch64 will be 6x slower. It'd probably be worse, or at least that's my gut feeling.

2. Generic container images like the Ubuntu mentioned usually have aarch64 (arm64) support, so running the x86_64 image makes no sense for the presented use-case.

3. You won't be able to use most software because they don't release ARM binaries ... and the example uses `wget` && `tar xf`, with no binary signature check. As someone who has been porting stuff from x86_64 to aarch64 for a couple of years, I admit I've seen this pattern frequently. The most obvious solution is to build from sources, which would have been better off on x86_64 too, instead of fetching a prebuilt (and unverified) binary from the internet. Maybe there are some CPU flags the compiler could notice and apply optimizations which are not included in the prebuilt binary.

I'm not an Apple fan and I'm certainly not a fan of cross-architecture development either. I do agree with the general idea behind the article, however I find it a bit hand wavy.

3 comments

> Generic container images like the Ubuntu mentioned usually have aarch64 (arm64) support, so running the x86_64 image makes no sense for the presented use-case.

I think the argument here is you can't build your own docker images that you use in production and run them on your mac without emulation (unless your production workload also runs on ARM).

That's a fair point. Emulation implies other limitations too - code compiled on your machine might leverage only the CPU features emulated, which would lead to sub-optimal binaries, not to mention much slower builds.
If you don’t have an environment between your laptop and prod you got more things wrong than this ARM migration.
Author here.

> 1. If emulating aarch64 (arm64) on x86_64 is 6x slower (on your system, btw, it's not an universal constant), it doesn't mean emulating x86_64 on aarch64 will be 6x slower. It'd probably be worse, or at least that's my gut feeling.

Yup, performance benchmarks are inherently flawed and nobody knows anything right now without the hardware. However if ARM -> x86 emulation is anything like x86 -> ARM emulation, I would expect a really big performance loss.

> 2. Generic container images like the Ubuntu mentioned usually have aarch64 (arm64) support, so running the x86_64 image makes no sense for the presented use-case.

Ah actually I address this in the article, and even run an arm64 image. The short version is, it would be a lot of work to convert your whole backend infrastructure to ARM just because you got a new laptop.

> 3. You won't be able to use most software because they don't release ARM binaries ... and the example uses `wget` && `tar xf`, with no binary signature check. As someone who has been porting stuff from x86_64 to aarch64 for a couple of years, I admit I've seen this pattern frequently. The most obvious solution is to build from sources, which would have been better off on x86_64 too, instead of fetching a prebuilt (and unverified) binary from the internet. Maybe there are some CPU flags the compiler could notice and apply optimizations which are not included in the prebuilt binary.

Yes, if only everything were built from source! I'm not saying there's no solution, just that the solution would be a lot of work. If the library is obscure enough and the errors are strange enough, it might be so much work as to be impossible to the busy web developer.

My goal was to write a kind of hand-wavy article to get people talking about this problem.

I agree on the performance loss. Just for kicks, I ran the same commands on some real aarch64 (32 cores, 3.0GHz, ARMv8.? - can't remember and already logged off the machine, but I can double check tomorrow). Without further context, numbers:

  someuser@some-aarch64-machine:~$ docker run arm64v8/ubuntu bash -c 'dd if=/dev/urandom bs=4k count=10k | gzip > /dev/null'
  10240+0 records in
  10240+0 records out
  41943040 bytes (42 MB, 40 MiB) copied, 2.18298 s, 19.2 MB/s
  someuser@some-aarch64-machine:~$ docker run amd64/ubuntu bash -c 'dd if=/dev/urandom bs=4k count=10k | gzip > /dev/null'
  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  10240+0 records in
  10240+0 records out
  41943040 bytes (42 MB, 40 MiB) copied, 6.72324 s, 6.2 MB/s
Awesome, thanks for testing this out!

A 3x slowdown is not as bad as 6x, but it's still quite a bit. I also saw a slowdown of ~4x when I tried this experiment on a native Linux x86_64 running ARM - perhaps the Mac -> Linux virtualization slowed it down further.

5x may have been a bit alarmist, but regardless we should brace ourselves for a big performance hit on x86_64 virtualization.

I'm surprised it's only a 3x slowdown. But the single-thread performance of native execution (without emulation) is worse on aarch64, which was expected. Imo, a better benchmark would take into account the multithread performance with/without emulation.
Yes, agreed. And the examples exposed are not fair. There are a lot of optimizations one can do in Docker, specially when dealing with I/O workloads (dd example in the article). Cloud providers have been doing this for long, long time already.. Why the author did not mention those, it is to be seen..