Join the GOOGLE +Rubber Room Community

Thursday, September 9, 2010

The Test Mess and CTB McGraw Hill

Why NY's test mess is far from over
By FRED SMITH, NY POST, September 8, 2010
LINK

New York's "test mess" is worse than even the avowedly reformist state education leaders have acknowl edged -- and it may not be over yet, either.

A close look at the data (some of which became available only via the Freedom of Information Law) strongly suggests that the exams created each year by CTB/McGraw-Hill -- which purportedly measure the math and English proficiency of 1.2 million New York students -- are fundamentally flawed. That means that even Regents Chancellor Meryll Tisch and state Education Commissioner David Steiner's "recalibrating" of the scoring can't fix the problem.

The state Education Department paid the company $38 million for the tests used in 2006-'09.

Like most such tests, CTB's exams contain both multiple-choice and constructed-response questions. The latter ask students to produce a response, for example showing how they solved a math problem or writing answers to express their understanding of reading passages.

Constructed-response items take more time and money to administer and score -- but educators generally believe these questions measure a higher order of knowledge and thought than multiple-choice items, which kids typically find less challenging.

Yet results from both types of questions should point in the same direction -- that is, if this year's 4th graders do markedly better on the math multiple-choice questions than they did the year before, then they ought to improve on the math constructed-response items, too.

In other words, on well-developed tests, the results on both types of questions are in harmony -- pointing in the same direction and nearly parallel from one year to the next. After all, each is supposed to tap a different level of knowledge of the same subject. Performance should move in a synchronized way.

That's exactly the pattern shown on the National Assessment of Educational Progress -- nationally and in New York. The "nation's report card" uses both types of items to measure reading and math proficiency -- and the performance of New York kids on both is strikingly consistent over time.

Not so, the results on the state exams.

Consider just the math tests, administered every year to students in each of six grades. We have data on the four years from 2006 to 2009, so we can look at whether scores went up or down for the six grade levels in each of three school years -- 18 comparisons in total.

In 10 of the 18 cases, raw scores (i.e., the percentage of questions answered correctly) rose on one of the types of question, but fell for the other. In four cases, there was a smaller divergence. In only four cases did the scores clearly move in the same direction.

Any testing professional should recognize this as an alarm bell: Something is seriously wrong with these exams. (And it is the tests, not the students or anything else: Again, the NAEP exams, covering the same areas, do not show these bizarre divergences over time.)

There are several more disturbing facts about the 2006-'09 exams:

* Larger gains were usually made on multiple-choice items than constructed response. This boosted the overall score -- leading to press releases and headlines that suggested everything was improving.

* Worse, data that contradict that storyline went undisclosed: The public didn't see separate analyses of constructed-response scores.

* Statistics (obtained via the Freedom of Information Law) on the field tests (where questions get "tried out" prior to creating the actual exams) show inconsistencies between multiple-choice and constructed response items. CTB should have seen this data and realized it had a big problem.

Internal consistency is a mark of test reliability -- and without reliability, tests can't measure anything in a valid way. And New York's exams have clearly been lacking in consistency.

It's likely the just-released 2010 test results bear the same fatal flaw. The "solutions" on offer from Tisch and Steiner -- raising "cut scores" and increasing the scope of material on the exams -- don't address the overriding issue.

What's needed is an independent probe of the testing program, one with sweeping authority to investigate the role of Education Department officials, CTB measurement specialists and the state's technical advisers in all aspects of the program.

I believe we've been sold defective goods. For starters, we should demand our money back.

Fred Smith, a retired Board of Education senior analyst, worked for the city public-school system in test research and development.

Committee On Open Government