| Fitting to a statistical model superficially makes sense. But I think the details kill it. The outcome you are measuring is the change in test score from before having a teacher and after. VAM attempts to statistically estimate the teacher's contribution to that change. Presumably, the test is of something that theoretically the students will not know beforehand. Which means the teachers don't want students who study on their own (or participate in activities where that knowledge might be useful). And they don't want students who aren't going to learn it -- whoops, that was a leap, I meant to say who aren't going to test higher at the end. So you don't really want the top tier nor bottom tier coming into your class. Nonspecific to VAM, but a result of standardized test results being used for anything meaningful to the teacher (salary, tenure, etc.) is that anything not on the test has an opportunity cost, and so will be omitted in favor of test prep. The more statistical validity that VAM has, the stronger this effect will be. If the teacher shows the students how to incorporate their new knowledge into a broader perspective, it may make the school's scores improve but it will screw over the next teacher in line (because the before test will be higher). So there's some peer pressure to make sure the students learn nothing that they're "supposed" to learn later. If you consider a subject like math, what happens is that at some point many students fall behind. This makes the later topics much, much harder, because they build on what they never quite understood. A perfect teacher would figure out what balance of old and new material to give each individual student. That perfect teacher would score poorly on VAM compared to a teacher who crammed in test-specific mechanics and regurgitation, relying on dismal beginning test scores to make poor but not awful ending test scores look good. The system would gradually optimize for squeezing incremental gains out of improperly taught students. And don't forget that the outcome is what's measured, and what's measured is crap. In football, you can look at a score (or just who won). Here, the structure is tuned to produce students who can do well on year-end tests but nothing else, certainly not on their ability to apply their knowledge to situations not likely to show up on a test. Ok, this became more of a rant against standardized testing, but it just bothers me that adding statistical power magnifies the problems. You'd be better off throwing in a large random component, so that teachers' innate desires to teach well have a chance at winning out over gaming the system. Because even if your population of teachers is really conscientious, you're actively selecting for those most willing to play the game. And selection always wins in the end. |
PS: There is a fair amount of momentum in many subjects so teachers can impact not just this years test results, but next years as well. In the end it's really difficult to come up with a high quality model and my guess is they simply did not bother.