We have very different experiences then. "Small" scale to me is n < 64, which is right on the threshold of whether it's faster to run an n^2 algorithm that could be nlogn with an FFT (for 1-dimensional vectors) - depending on the algorithm, implementation, and hardware of course.
I've benchmarked FFT/non-FFT based approaches for almost everything in that scale for the last ten years and gotten quite different results.
An interesting case is two-dimensional interpolation. For example, for precise image resampling. In that case, due to the separability, the FFT is still O(nlog(n)), thus just as fast and much more precise than space-domain computations with 2D splines. Try to rotate 10 degrees iteratively an image 36 times with splines and with fft-based methods. Even 7 tap splines (which is a huge order) will utterly destroy the image contents; but it is just peanuts to the fft.
I've benchmarked FFT/non-FFT based approaches for almost everything in that scale for the last ten years and gotten quite different results.