DYSPEPSIA GENERATION

We have seen the future, and it sucks.

Letting Students Drop a Question: A Big Mistake

30th July 2008

Read it.

An interesting perspective. The whole question of how to create a fair and effective examination is one which doesn’t appear to get a lot of serious thought.

One Response to “Letting Students Drop a Question: A Big Mistake”

  1. Jay Says:

    “The whole question of how to create a fair and effective examination is one which doesn’t appear to get a lot of serious thought.”

    Actually, it’s received quite a lot of thought over the years — but most educators don’t know it, and don’t have time to use the tools if they had them. There are various tests for reliability, content validity, criterion validity, and the like.

    Reliability is the quality of a test that will always produce the same measure for the same student. For instance, football games have low reliability — the same player can look great in one game and poor in another. Short answer tests have high reliability; some kinds of essay tests have low reliability. (In high school, I learned that I could write a B- paper on a novel if I read the first, third, and last chapters of the novel.) Reliability can be measured with the Spearman-Brown formula or Cronbach’s alpha.

    Criterion validity is the extent to which test scores reflects what you intended to measure. An essay test in statistics gives the highest grades to good writers, and so has little criterion validity. Elections give the job to people who can speak well in public, and so have good criterion validity only for candidates who can translate that skill into concrete action (Reagan) rather than using it to bamboozle (Clinton).

    Content validity represents the extent to which the test actually covers the content. A test of addition problems has low content validity if the class also covered subtraction, multiplication, or division. Allowing the students to drop a question lowers content validity if the point of the test is to measure broad knowledge of the entire subject area, but increases content validity if the point of the test is to measure the ability to fully develop a single theme.

    Various types of validity can be measured with Cohen’s kappa, Goodman-Kruskal tau, and various other measures.

    The question of how to create a fair test has received a lot of attention over the years, but it’s not going to be used by teachers and schools, because it’s way too much work.

    To put it in perspective, I’m teaching a stat course with six tests and another eight assignments that could be validated. But I also have a temporary consulting job measuring content validity for a single system. The consulting job pays roughly four times as much as the class.