They exist today, but they were added after AVX. Every year we figure out how to cram more transistors on a cubic cm, and once the low hanging fruit was added and we knew how to add more transistors, we decided to start putting more and more specific functions.
That is the point of Linus. He would have preferred to use that increase in transistor count for other things, like more cache.
More cache has diminishing returns, because cache wants to be as close as possible to the core logic. And modern CPU's are mostly cache anyway. Special-purpose blocks for common compute tasks are quite cheap.
That is the point of Linus. He would have preferred to use that increase in transistor count for other things, like more cache.