Hacker News new | ask | show | jobs
by FabHK 2675 days ago
That reminds me of what I think was a flaw in the computer-based adaptive GRE. "Adaptive" because it gives you harder questions when you're doing well, and easier ones when you're doing badly, thus allowing for the same measurement precision, if you will, with fewer questions.

My Spanish girl friend and I studied English vocabulary to prepare for the GRE, and took many old-fashioned paper based tests (non-adaptive) for practice, then later the actual (adaptive) one. Her result on the actual test was much worse than on the practice tests, by many standard deviations (only in the verbal section, not in the quantitative section).

Now, in English, the more difficult words are frequently the words of Latin origin (for example, "to lament" vs "to mourn"). However, those were often cognates of the equivalent Spanish words, thus easier for her. So, the hypothesis is that she got some questions wrong initially, and the algorithm decided to give her "easier" questions (with more Germanic words), which would be harder for her, though; while withholding the harder questions which she could have solved correctly.

Intriguingly, it might have gone the other way around (depending on whether you first got predominantly Germanic words, answered them wrong, and got even more of them, or first got predominantly Latinate words, answered them correctly, and got even more of them.)

Thus, if ETS tested the adaptive algorithm on native English speakers, the adaptive test might have lined up very well with the traditional test, validating it.

(Now we're coming to the intriguing part.) If they tested it also with Latinos/native Spanish speakers, it might well have been that the mean deviation (between adaptive and paper based result) was also very small, but the variance of the deviation larger: many large deviations to the upside, many large deviations to the downside.

I wonder whether that was ever researched in depth, and whether it could have been grounds for complaints (that members of some community had measurements that were "worse", but not in the sense of biased, but of "less precise", with more variance!)?

2 comments

> I wonder whether that was ever researched in depth

Yes, it was. My wife studied linguistics and she has an MSc in Education, she has tens of books on grading and evaluating English learners, and GRE is a test that has been studied extensively. I don't have any reference at hand (on my mobile) but feel free to search on any education-related journal: you'll find tons of sudies.

You're hypothesizing an additional axis of question variation, "Romance-Germanic", in addition to the well-known axis "difficulty". Do we have any reason other than this anecdote to believe this other axis actually exists for GRE? Why wouldn't easy Romance questions be sorted in with easy Germanic questions?
One reason to believe it (or more specifically, to believe the correlation between the Romance-Germanic axis and the difficulty axis is not zero) is that easier / more common vocabulary is often Germanic and harder / more literary vocabulary is often Romance, as a result of the origins of social classes and an intentional desire to use Romance words (or straight-up import Latin words) among those of higher learning / standing. That's the claim of several comments in this thread.