| Hey HN, A CSV parser using Go 1.26's experimental simd/archsimd package. I wanted to see what the new SIMD API looks like in practice. CSV
parsing is mostly "find these bytes in a buffer"—load 64 bytes, compare,
get a bitmask of positions. The interesting part was handling chunk
boundaries correctly (quotes and line endings can split across chunks). - Drop-in replacement for encoding/csv
- ~20% faster for unquoted data on AVX-512
- Quoted data is slower (still optimizing)
- Scalar fallback for non-AVX-512 Requires GOEXPERIMENT=simd. https://github.com/nnnkkk7/go-simdcsv Feedback on edge cases or the SIMD implementation welcome. |
I went on a similar adventure but in Zig. Since I had to prepare a benchmarking suite, I put out one in case anyone needs it. If you think it might be helpful, give it a go: https://github.com/peymanmortazavi/csv-race
In my findings, using 64 bytes (512-bits) even when possible actually degraded the performance. I also had to fine-tune the numbers for different CPUs. For instance on Apple, I could go much higher but on my CPU, if I went to 64 bytes (512-bits), It would degrade the performance.
Another thing I explored was to iterate on the fields as opposed to records. This allows you to just avoid any copying or dynamic memory allocation, which should give you a pretty decent boost. You can add utility wrappers to match Go's record based iteration when it is necessary.
Just some thoughts! but congrats on this!!