Hacker News new | ask | show | jobs
by sphink 4244 days ago
Fitting to a statistical model superficially makes sense. But I think the details kill it.

The outcome you are measuring is the change in test score from before having a teacher and after. VAM attempts to statistically estimate the teacher's contribution to that change.

Presumably, the test is of something that theoretically the students will not know beforehand. Which means the teachers don't want students who study on their own (or participate in activities where that knowledge might be useful). And they don't want students who aren't going to learn it -- whoops, that was a leap, I meant to say who aren't going to test higher at the end. So you don't really want the top tier nor bottom tier coming into your class.

Nonspecific to VAM, but a result of standardized test results being used for anything meaningful to the teacher (salary, tenure, etc.) is that anything not on the test has an opportunity cost, and so will be omitted in favor of test prep. The more statistical validity that VAM has, the stronger this effect will be. If the teacher shows the students how to incorporate their new knowledge into a broader perspective, it may make the school's scores improve but it will screw over the next teacher in line (because the before test will be higher). So there's some peer pressure to make sure the students learn nothing that they're "supposed" to learn later.

If you consider a subject like math, what happens is that at some point many students fall behind. This makes the later topics much, much harder, because they build on what they never quite understood. A perfect teacher would figure out what balance of old and new material to give each individual student. That perfect teacher would score poorly on VAM compared to a teacher who crammed in test-specific mechanics and regurgitation, relying on dismal beginning test scores to make poor but not awful ending test scores look good. The system would gradually optimize for squeezing incremental gains out of improperly taught students.

And don't forget that the outcome is what's measured, and what's measured is crap. In football, you can look at a score (or just who won). Here, the structure is tuned to produce students who can do well on year-end tests but nothing else, certainly not on their ability to apply their knowledge to situations not likely to show up on a test.

Ok, this became more of a rant against standardized testing, but it just bothers me that adding statistical power magnifies the problems. You'd be better off throwing in a large random component, so that teachers' innate desires to teach well have a chance at winning out over gaming the system. Because even if your population of teachers is really conscientious, you're actively selecting for those most willing to play the game. And selection always wins in the end.

2 comments

Your assuming the delta is based around just the prior test scores vs this one. aka old 10 new 15 or old 80 vs new 85 is the same improvement. However, statistically there is a tendency to regress toward the mean making simply staying at 80 end up as statistical progress. However, I suspect their using a flawed model that ignores the tendency for school districts to pack high preforming teachers on top of other high preforming teachers. To correct for this you need to look at what happens when someone moves from one district to another.

PS: There is a fair amount of momentum in many subjects so teachers can impact not just this years test results, but next years as well. In the end it's really difficult to come up with a high quality model and my guess is they simply did not bother.

Well it's not like teachers only stay in their position for a year. The framework could (and should?) keep on monitoring the progress of the students down the way and feed back to the teachers' rating until they graduate. That would also increase peer pressure and collaboration between teachers.