|
> You didn't really address the examples which mschuster91 provided and that's important to understanding the problem: On the contrary, I addressed it entirely. mschuster91 seems to be under the impression that the teacher evaluation schemes boil down to nothing but the simplest possible before-after comparison of grades of students, ignoring all issues of demographics, differing student quality, differing school circumstances, etc. Such a scheme is indeed absurd, as his counterexample proves, but it is not what has been proposed by pretty much everyone! The actual proposals are well aware of what he thinks is the fatal problem, and go to often elaborate lengths to model and adjust for these sorts of heterogeneities in order to quantify the value-added of a particular teacher. The problem is recognized, included, and mostly dealt with. Whether the solution works entirely or is worthwhile is unclear, but he's arguing against a strawman. > One estimate has ~12% of NYC public school teachers being punished by the flawed VAM in use there: So I've looked at http://mathbabe.org/2015/04/02/the-arbitrary-punishment-of-n... and I have zero idea what she is trying to show. She assumes independence and treats it as a coin flip. Ummm.... what? With that sort of logic, you could show no one could expect to score a 1600 on the SAT. When criticized she links to a real analysis†, which shows considerable non-independence which means her numbers are wrong and will overstate how many will be denied tenure based on the VAMs. By the way, why are you phrasing it as 'punished'? That sounds like you're assuming your conclusion. If VAM doesn't affect hiring decisions, there's no point to bothering with it in the first place is there, but if it does affect hiring decisions, that means teachers are being 'punished'...? †not that I think too much of it either, since it relies mostly on an argument from incredulity and pointing angrily at some scatterplots, and tries to ignore the r=.35 correlation of ratings from two subjects; to put an r=.35 in perspective, the correlation between years of education and intelligence is only ~r=.55! Even the best IQ tests won't correlate with Gf more than r=.7 or so. r=.35 is pretty good for a single pair. I don't know why he thinks a .24 is 'minuscule' when that means you're predicting half of variance... (I wonder if this is a graphing problem? He doesn't seem to jitter the datapoints, which for a large amount of discrete data will hide a lot of the density; a plot of r=.35 of n=6k should look much more striking, like this: http://imgur.com/KcwmJJH ) For implications, look at the first graph and think about classification rates. Look at the datapoints at 100 along one axis, then look across to see how many correspond to <10 on the other; hardly any do, and the 100s are almost all mapped onto 80+ on the other axis. Or look at the 0s. In terms of identifying the bottom decile, it's doing a good job. |