Hacker News new | ask | show | jobs
by berkut 547 days ago
A 256-item float32 LUT for 8-bit sRGB -> linear conversion is definitely still faster than doing the division live (I re-benchmarked it on Zen4 and Apple M3 last month), however floating point division with the newer microarchs is not as slow as it was on processors 10 years ago or so, so I can imagine using a much larger LUT cache is not worth it.
1 comments

does this include vectorized code? I stopped using LUTs for anything “trivial” probably 20 years ago because I rarely see any improvements (in particular where it would benefit the overall runtime noticeably).