To get the prediction variance in a Bayesian treatment, you integrate over the posterior of the parameters - surely computing or approximating the posterior counts as considering parameter variance?
Of course it does. You can put hyperpriors on the priors, and hyper hyperpriors on the hyperpriors, but the regress has to stop somewhere. What is your point?
I'm not sure I entirely follow your comment, however I was merely pointing out that reckoning with parameter uncertainty by "computing or approximating the posterior...", as you said, is not always applicable in probabilistic ML.
Yes, but that's true of all statistics. You have to make some assumptions to get off the ground. If you estimate parameter variance the frequentist way, you also make assumptions about the parameter distribution.
No, this is expressly untrue. In the frequentist paradigm parameters are fixed but unknown, they are not random variables, and have no implicit probability distribution associated with them.
An estimator (of a parameter) is a random variable, as it is a function of random variables, however this depends only on the data distribution, there is no other implicit distribution on which it depends.
For instance, the distribution for the maximum likelihood estimator of the mean of a normal distribution is normally distributed, however this does not imply that the mean parameter has a normal prior, it has no prior, as it is a fixed quantity.