Hacker News new | ask | show | jobs
by messe 2011 days ago
Yep, looks like simdjson is defaulting to a generic fallback implementation. I added the following to the start of main:

    const simdjson::implementation *impl = simdjson::active_implementation;
    std::cout << "simdjson is optimized for " << impl->name() << "(" << impl->description() << ")" << std::endl;
When built for Intel/Rosetta, it prints:

    x86_64% ./benchmark 
    simdjson is optimized for westmere(Intel/AMD SSE4.2)
    minify : 4.44883 GB/s
    validate: 5.39216 GB/s
On arm64:

    arm64% ./benchmark
    simdjson is optimized for fallback(Generic fallback implementation)
    minify : 1.02521 GB/s
    validate: inf GB/s
simdjson's mess of CPP macros isn't properly detected ARM64. By manually setting -DSIMDJSON_IMPLEMENTATION_ARM64=1 on the command line, I got the following results:

    arm64% c++ -O3 -DSIMDJSON_IMPLEMENTATION_ARM64=1 -o benchmark benchmark.cpp simdjson.cpp -std=c++11
    arm64% ./benchmark 
    simdjson is optimized for arm64(ARM NEON)
    minify : 6.64657 GB/s
    validate: 16.3949 GB/s
EDIT: Interestingly, compiling with -Os nets a slight improvement to the validate benchmark:

    arm64% c++ -Os -DSIMDJSON_IMPLEMENTATION_ARM64=1 -o benchmark benchmark.cpp simdjson.cpp -std=c++11
    arm64% ./benchmark
    simdjson is optimized for arm64(ARM NEON)
    minify : 6.649 GB/s
    validate: 17.1456 GB/s
2 comments

Thanks for getting to the bottom of this.

Looks like -Oz bumps validate up another few percent.

  % c++ -Oz -DSIMDJSON_IMPLEMENTATION_ARM64=1 -o benchmark benchmark.cpp simdjson.cpp -std=c++11
  % ./benchmark
  minify : 6.73381 GB/s
  validate: 17.8548 GB/s
Still a bit slower but much more competitive. Thanks for the additional investigation/validation!