Hacker News new | ask | show | jobs
by vintagedave 196 days ago
> XP-era hardware doesn’t have AVX. Probably doesn’t have AVX2 or FMA either. But SSE4.2 is safe for most 64-bit CPUs from 2008 onward:

It won't; FMA is available from AVX2-era onwards. If you target 32-bit, you'd only be "safe" with SSE2... if you really want a challenge, you'd use the Pentium Pro CPU feature set, ie the FPU.

I have to admit I'd be really curious what that looked like! You'd definitely want to use the fast math option.

This is an awesome effort, btw, and I enjoyed reading your blog immensely.

1 comments

Oh darn, you're absolutely right (pun intended) about the 32-bit situation. SSE2 is really the "floor" there if you want any kind of reasonable compatibility. I was being a bit optimistic with SSE4.2 even for 64-bit - technically safe for most chips from that era but definitely not all.

The Pentium Pro challenge though... pure x87 FPU inference? That would be gloriously cursed. You'd basically be doing matrix math like it's 1995. `-mfpmath=387` and pray.

I'm genuinely tempted to try this now. The build flags would be something like:

  -DGGML_AVX=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF \
  -DGGML_F16C=OFF -DGGML_SSE42=OFF -DGGML_SSSE3=OFF \
  -DGGML_SSE3=OFF -DGGML_SSE2=OFF  # pain begins here
And then adding `-ffast-math` to `CMAKE_C_FLAGS` because at that point, who cares about IEEE 754 compliance, we're running a transformer on hardware that predates Google.

If someone actually has a Pentium Pro lying around and wants to see Qwen-0.5B running on it... that would be the ultimate read for me as well.

Thanks for the kind words. Always fun to find fellow retro computing degenerates in the wild.