Hacker News new | ask | show | jobs
by PathOfEclipse 1293 days ago
You're conflating things together and missing the plot. I was comparing the teacher evaluations given by teacher unions to the evaluations of teachers based on student improvement over time on standardized tests. You, on the other hand, changed the subject slightly, which is fine, but then you use the phrase "grades given by unionized teachers", which makes no sense. All the studies you cited talk GPAs in general. Whether the teachers doing the grading are unionized or not is not mentioned or relevant. If you're going to have a discussion, you should take care to use more careful language. It looks like you're trying to be deceiving in order to prop up teacher's unions.

I clicked on your first study and it doesn't even do any analysis on GPA as a predictor of future success relative to other standardized tests. In fact, it assumes this as true and then tries to explain why. It also looks like a low-quality paper published at conference without peer review.

I clicked on your second link. It's not peer reviewed. It only examines data from one university. It has a very small sample size. It has zero citations. It reaches conclusions that are contrary to existing literature. It does not look like it should be taken seriously.

I decided at this point to stop wasting my time. If your first two citations are so weak, I don't have much confidence in the others.

1 comments

> I was comparing the teacher evaluations given by teacher unions to the evaluations of teachers based on student improvement over time on standardized tests.

Wait. You were comparing the evaluations of teachers based on how teacher unions ranked them, compared to the evaluation of teachers based on the standardized test scores for their students?

How is that at all useful?

Why are teacher unions ranking teachers? How does that affect anything? How are standardized tests - which aren't designed as a measure of teacher effectiveness - at all relevant, and not full of noise?

We know methods like VAM (Value-Added Models) are extremely easy to misuse - so easy the American Statistical Association points how how it's difficult to apply them to ranking teacher effectiveness (see https://www.amstat.org/asa/files/pdfs/POL-ASAVAM-Statement.p... ). Why should I believe this book you cite - which seems to be written by a journalist and not a statistician - does a good job of it?

> then you use the phrase "grades given by unionized teachers", which makes no sense.

That's because I didn't understand what your argument was. The usual argument is "teachers unions mean teachers are bad at their jobs so we can't trust their judgement and GPA. Instead, we need to look to standardized tests." That's the argument I thought you were making.

> it doesn't even do any analysis on GPA as a predictor of future success relative to other standardized tests

No, it doesn't. It does give the citation: Bowen, Chingos, & McPherson, 2009 . But https://www.degruyter.com/document/doi/10.1515/9781400831463... is behind a paywall.

If you want to argue that the summary of the citation I gave is a poor interpretation of the research, then go ahead. But then I can say that your summary of the book you read is also wrong.

A book which I also cannot read.

And which does not appear to be peer reviewed.

> It's not peer reviewed. It only examines data from one university.

Thing is, the second link also gives citations to other research.

] If standardized testing is not as reliable a measure of student success, as proposed by the researchers previously cited ... Hodara and Lewis (2017) concluded that HSGPA was a better predictor of college performance than standardized exam scores, especially for students who enter college within a year of completing high school.

These are not meant to be read in a vacuum, but as an indication that the certainty you state is far from established.

It seems like you're not at all understanding what I've been saying. The measure of teacher performance was based on student improvement over time on standardized tests. It is incredibly valuable to measure a teacher's capability in actually helping students learn and improve. After all, isn't that the sole purpose of teaching?

> That's because I didn't understand what your argument was. The usual argument is "teachers unions mean teachers are bad at their jobs so we can't trust their judgement and GPA. Instead, we need to look to standardized tests." That's the argument I thought you were making.

Yes, and we've also gone pretty far off-topic from what I was originally talking about at this point.

> No, it doesn't. It does give the citation:

Then why not lead with that citation and not the very weak conference paper that you chose to lead with?

Anyways, you may be right about GPA currently being a better predictor of academic success than the ACT. But, as this article explains: https://www.jamesgmartin.center/2020/02/gpa-or-sat-two-measu...

"In an extended version of their essay, Kuncel and Sackett acknowledge that GPA is the best predictor of student success, but they add: “Even better prediction is obtained by the combination of test scores and high school grade point average.” “Human behavior is notoriously difficult to forecast,” they write, “it would be strange for a single predictor to be the only one that matters. So it is also valuable to consider, whenever possible, how predictors combine in foretelling student success.”

If I were to further research this subject, I'd probably start with this book: https://www.amazon.com/Measuring-Success-Testing-College-Adm...

"Although the test-optional movement has received ample attention, its claims have rarely been subjected to empirical scrutiny. This volume provides a much-needed evaluation of the use and value of standardized admissions tests in an era of widespread grade inflation. It will be of great value to those seeking to strike the proper balance between uniformity and fairness in higher education."

Edit: also found this article interesting: https://www.latimes.com/california/story/2019-12-22/grades-v...

One person is quoted as pointing out that an advantage using the SAT is that it can help combat grade inflation because it looks bad to have a really high GPA but really low SAT score. It's also been shown over time that the average GPA keeps going up while SAT scores are flat or declining. Grade inflation is a major problem, and the use of standardized test does help with it.

> The measure of teacher performance was based on student improvement over time on standardized tests.

Except 1) those tests weren't designed for that purpose, and 2) they are a worse measure of student preparedness than GPAs, and 3) they only test those topics which are easy to test in a standardized setting.

And as I pointed out, the statistic methods used to find these patterns, like VAM, are intrinsically difficult, and easy to misinterpret.

> Then why not lead with that citation and not the very weak conference paper that you chose to lead with?

Because it was more informative than the citation you presented, which was a non-peer-reviewed book that I couldn't easily read by a journalist whose results as you presented are contrary to my (limited) understanding of the topic.

> I'd probably start with this book

Since you think peer review is important, why do you point to non-peer-reviewed sources?

Just looking at the authors shows that I expect them to have a pro-standardized testing viewpoint. All three of them work/have worked for a standardized testing company.

Sean P. "Jack" Buckley is an Institute Fellow and works with AIR on several projects in the areas of applied statistics, social sciences, and education policy. He is also President and Chief Scientist for Imbellus, a California-based assessment company ... he helped lead the redesign of the SAT at the College Board

Lynn Letukas is an associate research scientist at the College Board

Ben Wildavsky is/was a senior fellow and executive director of the College Board Policy Center.

> Except 1) those tests weren't designed for that purpose

I don't know if that's true. And even if it were, why would they need to be designed for that purpose to be successfully and correctly used for that purpose? In fact, at one time the federal government required this data to be provided by schools. However, the teacher's unions lobbied hard, and the 2015 "Every Student Succeeds Act" barred the government from requiring this data: https://www.edweek.org/policy-politics/essa-loosens-reins-on...

"But the teachers’ unions see an opening to change policies their members have broadly rejected. They are also far more powerful among state legislatures than in Congress."

"The American Federation of Teachers plans to bring its political clout to bear on the issue, too."

On the other hand, strong research exist to show that SGPs are a valid and useful measurement: https://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.104.9.2593

"The main lesson of this study is that value-added models which control for a student’s prior-year test scores provide unbiased forecasts of teachers’ causal impacts on student achievement. Because the dispersion in teacher effects is substantial, this result implies that improvements in teacher quality can raise students’ test scores significantly."

And the follow-up study: https://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.104.9.2633

"This paper has shown that the same VA measures are also an informative proxy for teachers’ long-term impacts."

> 2) they are a worse measure of student preparedness than GPAs

GPAs are highly subjective, and more importantly, harder to compare across schools and even across classes. By using standardized scores, for instance, one could track successfully that a teacher's performance remains consistent as he or she moves across schools. Remember, this was about measuring teacher performance, not student performance. That said, if GPAs really were better for teacher evaluation, there is nothing stopping you from measuring student GPA improvement instead of student standardized test score improvement, so I'm not sure what you're really arguing against at this point.

> 3) they only test those topics which are easy to test in a standardized setting.

Many important topics taught in secondary school are well-understood and amenable to standardized testing, including: math, reading comprehension, grammar, some aspects of science and history, etc.

> Since you think peer review is important, why do you point to non-peer-reviewed sources?

These books cite peer-reviewed sources and are a great starting point before digging further.

> Just looking at the authors shows that I expect them to have a pro-standardized testing viewpoint.

Everyone is biased. The NEA spends millions convincing people to drop standardized tests through their advocacy group, FairTest, which serves as one of their front organizations. Much of education academia is biased against standardized testing. Biases are everywhere, and were fairly obviously present in your sources that I checked. At some point, you have to pick a bias you trust more, and I trust the bias that says standardized tests are useful over the bias that says they should be entirely done away with.

> "This paper has shown that the same VA measures are also an informative proxy for teachers’ long-term impacts."

Ah, as I figured, you are promoting VAM. I already mentioned how it's a difficult tool to use. And there are well-known problems with using VAM which aren't mentioned in that paper, which you don't seem to be aware of.

For example, a Texas court threw out EVAAS, as a way to evaluate Houston teachers, because of due process concerns, like how teachers are unable to have their score independently re-evalauted. The judge also points out the "house-of-cards" nature of VAM, and the ongoing academic debate about its applicability. https://www.courthousenews.com/wp-content/uploads/2017/05/Ho...

The VAM opponent expert witness presented their main arguments. Quoting http://vamboozled.com/houston-lawsuit-update-with-summary-of...

1) Large-scale standardized tests have never been validated for this use.

2) When tested against another VAM system, EVAAS produced wildly different results.

3) EVAAS scores are highly volatile from one year to the next.

4) EVAAS overstates the precision of teachers' estimated impacts on growth

5) Teachers of English Language Learners (ELLs) and “highly mobile” students are substantially less likely to demonstrate added value

6) The number of students each teacher teaches (i.e., class size) also biases teachers’ value-added scores.

7) Ceiling effects are certainly an issue.

8) There are major validity issues with “artificial conflation.” (This is the phenomenon in which administrators feel forced to make their observation scores "align" with VAAS scores.)

9) Teaching-to-the-test is of perpetual concern.

10) HISD is not adequately monitoring the EVAAS system. HISD was not even allowed to see or test the secret VAM sauce.

11) EVAAS lacks transparency.

12) Related, teachers lack opportunities to verify their own scores.

Here's one paper analyzing the specific details of the EVAAS numbers SAS generated for Houston - https://www.researchgate.net/publication/341532272_Methodolo... , with citations of its own about various issues with VAM. More below (via Google Scholar 'EVAAS houston effective').

> consistent as he or she moves across schools

Here's another paper: https://www.redalyc.org/pdf/2750/275022797012.pdf . "Almost half (46%) of a sample of HISD teachers who moved to different grade levels reported switching value-added ranks after the move, from “ineffective” to “effective” or vice versa, also across grade levels that were adjacent ".

If it's not consistent when moving grade levels, why do you think it's consistent moving across schools?

Is it because "Dr. William L. Sanders, the developer of the SAS ® EVAAS®, claims that teachers who move from one environment to another, even if radically different, continue to do just as well (LeClaire, 2011)"?

> GPAs are highly subjective, and more importantly, harder to compare across schools and even across classes.

And yet are a better predictor of future academic success than test scores. As I highlighted.

It appears you prefer to use use a worse predictor, one which requires an artificially imposed "high-stakes" testing environment, because it lets you do fancier types of data science that appeal to your sense that numbers are objective.

> strong research exist to show that SGPs are a valid and useful measurement

Remember earlier how you implied these methods were objective?

Odd that the paper you linked to says the other VAM methods didn't factor for a "drift in teacher quality".

Almost as if there's no agreement on what the model should be.

Almost as if the choice of model to use was also "highly subjective."

If they aren't subjective, then different VAM models should make the same predictions for the same population, right?

Points #2 and #3 above should be very rare, right?

And if they are not rare, they should not be used to determine who to fire, right?

> Remember, this was about measuring teacher performance, not student performance.

And VAM has not proved useful at measuring teacher performance, because of the flaws I quoted above.

I believe you approve of the idea of firing teachers with low VAM scores, which Houston and other school districts have done. Yet, quoting now from "All sizzle and no steak: Value-added model doesn’t add value in Houston" at https://journals.sagepub.com/doi/full/10.1177/00317217177341...

] while EVAAS was in use for educational reform purposes in Houston (i.e. to increase student achievement), Houston students saw no improvements of the sort that had been promised in grades 3-8 in reading, grades 4 and 7 in writing, grades 5 and 8 in science, and grade 8 in social studies (Figure 1, blue trend lines). In those subject areas and grades, tests scores declined overall from 2012 to 2015, as compared to other similar students throughout the state (black trend lines).

Almost as if VAM-based firing isn't a useful tool.

> and amenable to standardized testing

Yes, that's exactly my point. You highlighted the areas which are easy to test.

Composition is not easy to test, and it's also important. Being able to write an essay on Populism in the late 1800 US is not easy to test (not impossible - the AP American History tests do this, but it's expensive). But this is also a skill taught in school. My school required students take a practical art course. Yet testing for drafting skills, or wood working, or auto repair, isn't included in the high-stakes testing.

Why does it just happen to be that only those things which are easy/cheap to test are coincidentally the right topics to test?

> Everyone is biased

Film at 11. I don't listen only to Philip Morris scientists to judge if smoking tobacco has health problems.

> I trust the bias that says standardized tests are useful

So far it doesn't seem like you are aware of the evidence that VAM is not an effective method for deciding if a teacher should be fired. That would easily explain your comments.

> Ah, as I figured, you are promoting VAM.

I am promoting student growth measures in teacher evaluations, which is related to VAM, but not necessarily the same thing depending on who you are talking to: https://www.michigan.gov/mde/services/ed-serv/educator-reten...

> If it's not consistent when moving grade levels, why do you think it's consistent moving across schools?

Some models have been shown to be consistent across both grade levels and schools.

> And yet are a better predictor of future academic success than test scores. As I highlighted

And as I highlighted, GPAs are too subjective and in control of the teacher. If GPAs were used for teacher evaluation a teacher could write his or her own pay check via grade inflation. Standardized tests also act as a counterbalance in general to slow down grade inflation.

> Almost as if VAM-based firing isn't a useful too

Neither is trusting the teacher's unions to decide who to fire, because they fire virtually no one and even protect known incompetent teachers. We need a better solution than what we have now, and SGPs/SGMs have been shown to be far more effective than what we have now.

> So far it doesn't seem like you are aware of the evidence that VAM is not an effective method for deciding if a teacher should be fired

There is also plenty of evidence for using student growth measures as part teacher evaluations. You seem happy, however, to ignore the evidence I've given you that goes against your own bias. A little self-awareness might be in order.