Sunday, December 28, 2008

Testing, Teaching, and Tolerance

Joel Klein, the man who is pretending to be the Chancellor (but as he has no contract he is in violation of Education Law 2590-h, see the article The "Who Are You Kidding??" Award Goes To: Joel Klein, New York City Board of Education Pretender"), stormed out in anger after the NYC 'school board' or "Panel For Educational Policy" (PEP) voted to support his new 8th grade promotion policy, and the audience protested...VERY loudly. It seems that he is getting less and less tolerant of hearing voices he does not agree with or like.

I offer the article below which describes the issue of high stakes testing:

Report of the UFT Task Force on High Stakes Testing April 2007
Apr 20, 2007 2:26 PM


Anyone who enters the teaching profession has to have a commitment to educating students. Without that commitment the job is too demanding, the conditions too demeaning and the rewards too small. Teachers want their students to succeed. They hold themselves accountable for their students' performance. No teacher goes to work with the goal of failing a student. The biggest reward in this profession is seeing a student succeed. The question then is what kinds of tools, as well as knowledge, methods and support services will help teachers to do this. One tool that teachers need and value is reliable data about student progress and achievement.

For more than a decade efforts in New York City and New York State public schools to raise academic standards, improve the quality of education for all students and to help students succeed have been accompanied by an equally vigorous movement to develop and implement a variety of tests and assessments that ideally would not only measure student achievement but also would be useful in improving the quality of instruction on a day-to-day basis.

The Recent History of Testing

In the 1990s the New York State Education Department revamped the high school Regents exams and phased out an easier alternative for some students, the less demanding Regents Competency Tests. They introduced standardized tests in grades 4 and 8 that were intended to provide benchmarks for evaluating schools, not students. The passage in 2001 of the federal No Child Left Behind Act (NCLB) required all states to implement testing of reading and math in grades 3-8 as a school accountability provision (though not as a tool for making many high stakes decisions such as those about promotion or graduation). This requirement pushed New York State to expand its existing standardized testing program in reading and math from the 4th and 8th grades to grades 3, 5, 6 and 7. At the same time the New York State Board of Regents moved to make the standards for graduation more rigorous by requiring all students to pass a number of Regents exams as a prerequisite for graduation. The increase in the number and importance of standardized tests NCLB generated has had an enormous impact on schools throughout the state.

Here in New York City, at the same time as the state was revamping its testing program to conform with NCLB, the city school system continued to administer its own grade 3-8 testing program as required by the 1968 decentralization law and added another series of assessments in kindergarten through 2nd grade (most notably ECLAS) that exceeded the requirements of both the state and NCLB. For a time this led to duplicative testing by the state and the city of students in the 4th and 8th grades and the reporting of often conflicting student data. This duplicative testing ended in 2005 before it could expand to all students in grades 3 through 8 largely due to the public outcry of parents and teachers.

Commenting recently on the state of testing in the city a former head of the DOE’s Division of Assessment and Accountability on reviewing the testing calendar issued annually by his former office remarked that in New York City's schools someone is being tested on four out of five days every week of the year.

Testing for Accountability

Although NCLB mandated testing to measure school performance, the increased number of tests students must take and the importance given these tests as the sole determinant in such high stakes decisions (even though the tests used are not intended for these purposes) as eligibility for gifted programs, promotion of students, and entrance into select schools, has also sparked angry responses from parents, teachers and students.

The testing and accountability provisions of NCLB delineate consequences for schools and districts that fail to make adequate yearly progress (AYP) towards a uniform level of proficiency for all students. The law only allows schools to make AYP if a certain percentage of students overall, and a certain percentage of students in all the subgroups based on race, gender, poverty, special needs and English language ability in the school achieve an absolute level of proficiency. The federal law designates schools that fail to make progress as "schools in need of improvement" (SINI) and school districts must give students the right to transfer to a non SINI placement. Designation as SINI, if ongoing, can result in other sanctions and punitive measures including ultimately closure or reorganization as a charter school under private management. This federal accountability system operates parallel to but independently of New York State's existing accountability measures which uses its 4th and 8th grade reading and math tests as the basis for state designation of schools as low performing or SURR (schools under registration review).

This year the New York City Department of Education (DOE) began its own citywide accountability system, using measures that differ from those of the federal and state governments and giving schools a letter grade of A, B, C, D or F. This rating system is modeled on a system in use in Florida for more than five years and that, opponents there claim, has had a demoralizing and punitive effect on schools. (An October 2006 Zogby poll showed that 61% of Florida voters disagreed with using test scores as a basis for funding and grading schools.) New York City officials acknowledge that these different systems of evaluation can result in a school receiving, for example, a letter grade of A yet have a designation under NCLB as SINI or by the state as SURR.

The Misuse of Tests

The intuitive appeal of test scores as measures of student performance and of the tests as representative of high standards of achievement has prevented a meaningful discussion of their limitations and the negative effects of attaching high stakes consequences and sanctions to test results. It is not possible to explain all of the sometimes very technical limitations and consequences of high stakes testing in a few short sentences. Nevertheless concern and controversy around testing competing accountability systems have grown as the federal government and state and local school systems continue to misuse them to determine promotion, graduation and other high-stakes decisions for students as well as a basis for evaluation of schools and districts.

This task force sponsored a November 11, 2006 UFT conference on high stakes testing at which James Popham, former high school teacher, test designer and distinguished author, pointed out that those who advocate for the misuse of student test scores to evaluate individuals, schools, and entire school systems are ignorant of or choose to ignore the fact that the makers of these tests never intended them to be used for those purposes. The use of these tests for making these decisions is questionable at best, he said. He is not the only expert decrying this use of tests. Professional organizations such as the National Academy of Sciences, the American Psychological Association, the National Council on Measurement in Education, the National Council of Teachers of Mathematics, the National Council of Teachers of English and the National Parent Teacher Association, have all come out against high stakes testing. The American Education Research Association has stated that tests are always fallible and should never be used as high stakes instruments.

Yet wrongheaded proposals from Chancellor Klein, elected officials, corporate heads and other non-educators who do not understand the limitations of the test data continue to call for the misuse of student test scores in order to make important decisions about children as early as kindergarten. They are also proposing misusing these test results as an evaluative tool for teachers, as a factor in determining teacher salaries and as a basis for granting tenure.

The UFT Task Force

Rationale for its creation

As a result of the widespread and ongoing concerns that parents, teachers and students have expressed, and given the lack of real opportunities the public has had to question decisions the city and state have made, President Randi Weingarten announced in the late spring of 2006 the formation of a UFT Task Force on High Stakes Testing. The UFT leadership charged the task force to provide a forum for public debate and discussion and to make recommendations to help teachers and guide the UFT leadership in its ongoing discussions with city and state officials, and, in collaboration with the American Federation of Teachers, the federal government. Membership was open to all UFT members as well as officers and staff who expressed an interest in joining the task force. Vice-president Aminda Gentile chaired the task force.

A context for the work of the task force

Opponents of testing frequently remark, “Pigs don’t get heavier merely by weighing them everyday.” The proponents of testing reply, “You have to weigh the pigs to know if the feed you’re using and the way you’re feeding them is having the desired result.” This task force sought a proper balance between a rich instructional program that focuses on the whole child and an accountability system that allows us, as professionals, to evaluate whether we are accomplishing our purpose.

As a union of professionals we believe that our members help students learn and grow in many ways and we are professionally accountable for providing a rich educational environment to help the students in our classes succeed. This does not mean that our students or our performance should or can be judged by a single test score from a single day. The negative consequences of a high stakes testing process are more deleterious than any apparent gain in accountability measures. Proponents of testing are quick to portray any attack on high stakes testing as an attempt to avoid accountability. That is not our purpose.

The work of the task force

Over the past several months this task force, composed of teachers from all levels, clinicians, teacher center specialists and UFT staff, made an intensive study of the topic, familiarizing themselves with the actual assessment tools in use in the city’s schools as well as federal, state and city regulations and policies regarding testing and assessment. The task force members also studied testing and assessment policies in other states. They utilized and shared articles, opinion pieces and research from a variety of sources and perspectives to help them in their discussions. In addition, task force members designed and participated in the aforementioned November 2006 conference as well as the citywide forums described below in order to gain as wide a perspective possible about the impact of high stakes testing and to provide information about the issue to the public.

Among the questions for which the task force sought answers were:

How has the emphasis on tests in the city's Children First agenda and the federal NCLB legislation affected what goes on, day-to day, in schools and classrooms in our city?
What are alternatives to standardized high stakes tests in assessing student achievement?
What are the proper uses of tests and assessments?
How can the proper use of time, resources professional development and student data help improve teaching and learning and ensure a quality education for all our students.

To guide them, task force members shared articles, opinion pieces and research on the broader issues of testing and assessment. Their own professional experiences based on the situation in the schools in which they teach or with which they are familiar, also guided the members of the task force. The task force was fortunate that among its members, all of whom were knowledgeable and passionate were teachers from Urban Academy, a member of the New York State Performance Standards Consortium. Teachers at this school, and others in the consortium, have extensive experience in developing and using performance-based and alternate assessments such as projects and portfolios for the evaluation of student learning.

The UFT Forums on High Stakes Testing

In a series of public forums on high stakes testing this task force held throughout the city in December 2006 and January 2007, teachers, parents, students, elected officials and others with a stake in public education spoke out about the adverse affect the current high stakes testing culture is having on instruction and learning. While recognizing the importance of testing and assessment as an indicator of student progress and a valuable resource for guiding instruction, these forums showed that many members of the public are concerned about excessive testing as well as inappropriately linking the results of these tests to important decisions about a student's or a school's future. Concerns were also expressed about the pressure to raise student test scores, the narrowing of the curriculum in order to focus upon only the skills and knowledge needed to pass the tests, the demands that testing and test preparation puts on instructional and professional time, the decreased flexibility in professional decisions teachers experience as a result of the emphasis on tests, and how these high stakes tests are affecting the education of special populations of students such as English language learners and students with disabilities. .

"We're teaching students to be test takers"

One speaker, summing up the feelings of many, noted, "Students are being taught to take tests, to figure out what the test wants one to say and then to say it. They are not being taught to be critical thinkers."

A 4th grade teacher complained, "We are losing classes due to test prep. Before we dropped science and social studies to prepare for the ELA (English Language Arts) exam; now we are preparing for the math [exam]. Now there is no art."

Many teachers bristled at the lack of respect shown for their professional autonomy and decision making regarding the education and the future of the students they teach every day. Principals and administrators prefer, they said, to look only at student test score data on a single test when making decisions about that child, ignoring teacher input. "My experience is worth a lot more than 'open up to page five'" said one teacher.

In addition to the citywide reading and math tests teachers must now also administer pre-packaged interim assessments such as ECLAS-2, EL SOL and Princeton Review. These assessments often have little relationship to the actual material being taught and publishers provide little or no information regarding their technical adequacy as predictors of future achievement. Many principals also mandate the use of ongoing assessments such as portfolios, anecdotal binders, running records and reading assessment charts that are often duplicative, generate excessive paperwork and are ill suited as diagnostic tools for certain subjects and students. Although they might occasionally provide useful data, given the current focus on testing there is, paradoxically, little opportunity to translate that data into real, non-test preparatory instruction. One forum participant characterized herself and her colleagues as "The Stepford Teachers” because they are forced to follow a lock-step teaching model focused solely on test preparation.

"What good are the test results if we never get them?"

Teachers also said the results of the many tests and assessments New York City's public school students must take far too often do not come back to them in enough time to help them guide individual student learning. This includes the interim assessments whose stated purpose is to give teachers guidance on what the student needs to learn to do better on the real test. Teachers cited delays of up to eight months between test administration and release of scores. To be useful, teachers said, the turnaround time for releasing results of assessments conducted during the school year should be as brief as possible, a matter of days at most, in order to provide timely information to teachers, parents and the students themselves. It is not clear that any test publisher or outside vendor has the capacity to do that for a single school, let alone for a school system the size of New York City.

"The Bubble Student"

Teachers at the forums also discussed the phenomenon of the "bubble student," the student who scores within a few points of the next level of performance and who with extensive test preparation can move up a level and "improve" the overall performance of the school. These students are often singled out for intensive preparation for the test, in the guise of tutoring, to try to ensure that they make it to the next level. The students who need the most help, paradoxically, get less because they are less likely to make the jump from level one to proficient. In this climate where scores are everything test preparation is becoming a subject unto itself.

"I am reading test preparation booklets, not Shakespeare," said a middle school student.

Teachers at the forums said their students are no longer individuals with strengths, interests, talents and challenges but are looked at by data analysts as "a 1, 2, 3, or 4," the number corresponding to the student's level of proficiency on a reading or math test. Nor is the "bubble student" phenomenon limited to this city's schools. Schools around the country, operating under the NCLB sanctions for failure to make AYP are desperate to improve test scores. The Washington Post of March 4, 2007 reported in an article "A Concentrated Approach to Exams" that one Maryland school singled out Black, Latino and Asian students who had the best chance of improving their [and the school's scores] and offered them intensive test preparation while ignoring those who needed the most help but were seen as unlikely to improve substantially.

This emphasis on test scores and the corresponding data they yield about levels of student proficiency as the most important (and the defacto only) indicator of school quality above all other possible indicators means, unfortunately, that teachers are unable and, according to forum participants, often prevented from giving students at the lowest and highest levels of test score performance the individualized attention they should get. English language learners, gifted and talented students, transfer students, students with histories of erratic attendance, special education students and others with specific needs are too often ignored. Administrators, whose evaluation unfairly depends on the overall performance of the school, as determined by student test scores, focus their attention on test preparation rather than on a high quality educational program for all with appropriate student supports and individualized instruction.

"Most AIS (academic intervention services) instruction is devoted to pull out every day for the tests coming up," said one elementary school teacher.

Students sense this too. One parent said her middle school age son said, "If you get a 4 (the highest level of proficiency) then you don't even need to stay awake for anything else."

"What about students with specific needs?"

This excessive testing is harmful to specific groups of students. Teachers and parents talked about the NCLB requirement that special education students must take tests at their grade level, a requirement that prevents them from demonstrating true progress, reinforces feelings of inadequacy and frustration and alienates them from school, increasing the likelihood that they will ultimately drop out. The law permits only a small percentage of the most profoundly disabled to be exempt from the standardized testing program, though states must develop alternative measures to evaluate their progress. The law requires students who are new to this country and whose native language is not English to take grade level tests in English after only one year, despite research that shows it can take five to seven years for English language learners to acquire the English language skills necessary to perform on a level with their peers. This requirement generates the same feelings of alienation and frustration among English language learners as it does in other students with special needs.

"If it's not tested it doesn't matter"

Teachers and parents in all five boroughs also spoke about the elimination of and cutbacks in art, music, social studies, science, physical education and foreign language programs, as well as field trips and after-school recreational programs in order to provide more time for test preparation for the high stakes math and English language arts tests. A member of a Community Education Council noted that only 40% of the elementary schools in his borough now offer physical education. The city's announcement on March 6, 2007 that it is implementing a citywide science curriculum is a welcome acknowledgement that a well-rounded curriculum has to include more than test prep for math and English but other subjects remain eliminated, ignored or curtailed. But the announcement does nothing to prevent principals from canceling science in order to find time to maximize performance in English and math by maximizing test preparation.

Consider the comments of a middle school teacher from Brooklyn who described the emphasis on test preparation in her school. “Subjects like social studies are put on the back burner so that everyone can focus on the tested subjects of ELA and math. In our school a science lab was disbanded to make more room for ELA and math classes even though the school has an accelerated science program where 8th graders can take the biology regents but now they don’t even have a lab to do dissections.” In a test driven climate only subjects for which there are tests are valued.

"I'm concerned about the health of my child"

Parents spoke with understandable emotion about the adverse affect excessive testing has had on the physical and emotional health of their children. Students as young as eight and nine have characterized themselves as "failures" due their score on a single math or English language arts standardized exam. According to parents their children often experienced sleepless nights, vomiting, severe headaches and anxiety attacks during the days leading up to the fateful high stakes test. A school psychologist said she teaches students how to “de-stress the test". She described how children tell her they become so anxious before the tests that they have stomach aches and can’t sleep, and she pointed out that test anxiety causes students to forget what they know and affects how well they listen to directions and to stories read aloud to them. A PTA president said, "Our kids are getting lost in these tests."

Once again, the issue is national. A January 12, 2007 article in USA Today "How Bush Education Law has changed our schools" quotes Carmen Melendez, a bilingual language arts teacher, who has since left teaching:

"It was insane. The kids were all jaded. They were tired, they hated school. They're 8 years old, and they're so worried about a passing score. I think that's inhumane."

David Keyes, a second grade teacher in Maryland writes in "Crying Over A Test" in the March 2007 NEA Today:

"The test prep program completely changed the classroom culture. In writing I might expect a well-edited paragraph from one student, two simple sentences from another. The test prep program sends a very different message: each question has one correct answer, which all students must find. Recently two struggling students who had failed to get a single answer right all week broke down in tears."

"Is it higher student achievement or easier tests?"

Finally, the law of unintended consequences was evoked by classroom veterans as they spoke of the "dumbing down" of tests and the intentional design of tests in order to yield higher scores, not surprising now that high test results are so explicitly and inextricably linked to the careers of principals, superintendents, a chancellor and even a mayor. "How do you get students to run five miles faster?" asked a former principal. "Make the five miles three."

"It's not a new problem"

The criticism of high stakes testing is not new. In 1906 a New York State Education Department official made these comments as the legislature contemplated establishing a high stakes test:

“It is an evil for a well taught and well-trained student to fail in an examination. It is an evil for an unqualified student, through some inefficiency of the test, to obtain credit in an examination. It is a great and more serious evil, by too frequent and too numerous examinations, so to magnify their importance that students come to regard them not as a means in education but as the final purpose, the ultimate goal. It is a very great and more serious evil to sacrifice systematic instruction and a comprehensive view of the subject for the scrappy and unrelated knowledge gained by students who are persistently drilled in the mere answering of questions issued by the Education Department or other governing bodies."*

In ignoring this prescient warning, we are reaping exactly the consequences that the education department official foresaw. In a June 18, 2000 op-ed piece in the New York Times "When Testing Upstages Teaching," Kate Zernike quotes Sue Bastien, the head of Teaching Matters, a group of former teachers devoted to teacher training, as saying:

"The standards which should have guided us have bowdlerized into standardized tests which are then used to punish low-scoring schools rather than improve the curricula or teachers' skills."

The problem is not new, but obviously forum participants and educators from around the country have said the problem is that it has gotten worse.

* (from Sharon L. Nichols and David Berliner, (2007) Collateral Damage: How High Stakes Testing Corrupts America’s Schools, Harvard Education Press, Cambridge, MA, p. 5)

Based on readings, presentations, public comments and our own discussions the task force offers the following recommendations.

1. New York City must stop the compulsory administration of standardized tests every six-weeks.

These mandated, often expensive, packaged tests are duplicative, not connected to what is happening in the classroom on a daily basis, and steal time from instruction. The requirement for mandatory six-week testing can only increase the perception that the role of teachers is not to teach but to test and collect data. Teachers at the forums and elsewhere have said that they learn little about students from these packaged tests that they would not have known otherwise from their own evaluations and assessments. The interim tests the DOE requires teachers to use do not always match the subject matter or skills being taught and the results often come in weeks or months after the tests are given.

Additionally, mandated interim assessments administered at the same time to all students do not necessarily provide reliable data for all students and, at times may be unnecessary. These interim assessments generate excessive paperwork and, as we have begun to see with ECLAS and Princeton Review, diagnostic assessments are being used not only to assess but also to make high stakes decisions about students' futures, a use for which these assessments were certainly never intended, and a use for which there is no evidence of validity. Although DOE initially stated these interim assessments were solely for internal use for the improvement of instruction, they now plan to include the results of these interim assessments in their newly developed $80 million dollar ARIS data collection and tracking system that becomes operative in September 2007.

2. Tests and assessments are diagnostic and instructional tools. They must not be the sole determinants for student placement, promotion, graduation and other high stakes judgments.

High stakes judgments for schools and students, as mandated by city, state and federal requirements must be based on multiple forms of evidence, not only standardized tests. Inferences and decisions that educators draw from test results must be only those that the test was designed to and truly measures. Teachers should be given access to the test publisher’s data on validity, reliability and the basis for the normative comparison or how criteria for passing were set. Professional development activities in schools should include discussions of the meaning of this material and how teachers can explain scores to parents and students.

Educators know that students develop and learn in different ways and at different rates. In order to assess student learning and make decisions about what’s best for a child’s long term academic success and emotional development, it is necessary to use a variety of means to measure ongoing student progress and performance. Especially important is measuring that which occurs on a daily basis in the classroom, in areas such as reading, writing, speaking, problem solving and critical inquiry. These are skills that are necessary for success beyond high school and are not necessarily measured on a single standardized test or a mandated interim assessment. The state and city should base student promotion on multiple indicators such as grade point averages, performance assessments, portfolios, teacher comments, attendance and test scores.

3. Do not use student test scores to evaluate teachers.

The use of student test scores as a determinant of teacher quality has a simplistic appeal--students who score high on standardized tests must have received instruction from qualified teachers and those who score low obviously did not receive good instruction. To paraphrase H.L. Mencken, most simple solutions are also the wrong solution. The use of data from student test scores on standardized tests to evaluate teachers may appear simple, be intuitively appealing, but it is wrong.

Student scores on standardized tests provide a sample of student performance on a single exam but do not provide other important measurements of student achievement. Standardized tests do not isolate the other factors that may affect student achievement such as class size, the ratio of adults to students, quality of facilities, availability of resources, poverty, attendance patterns, parental involvement, facility with English, and prior schooling experience or an upset stomach on testing day. The importance of these other factors in student achievement was noted by a Chicago Public School principal, Barbara Williams, in a March 6, 2007 Chicago Tribune article, "City grade schools shine on tests." She was "devastated" by the drop in reading scores in her 5th grade, the cause of which she said was "one classroom packed with 33 children and serious behavior problems."

Teachers, principals and those familiar with tests and the science of psychometrics, a number of whom spoke at the UFT's forums on high stakes testing, pointed out that the questions on these standardized tests are frequently highly correlated with socioeconomic status. The questions which really measure the students' prior knowledge and experience, what they bring to school, provide the best way to ensure a “proper” distribution of scores, and accounts for the subtle bias that many people find in these tests. The test scores, then, do not measure only what happened as a result of instruction but also measure a host of other variables. These scores are not valid measures of teacher performance.

Experts have also pointed to the lack of alignment between standards, curriculum and assessments. The tests and assessments New York City's students take often have no relation to what teachers are teaching. Currently, given the misalignment among standards, curriculum and assessments, the solution to raising student test scores has been the questionable instructional technique of "teaching to the test." Intensive test preparation results in a superficial covering of material and, in many cases, ultimately becomes curriculum and instruction. The noted historian and former Undersecretary of Education Diane Ravitch is one of many educators who have shown how the emphasis on standardized testing narrows and waters down the quality and the quantity of school curriculum, and, as mentioned in the forums, eliminates entire subject areas.

Using student test scores to evaluate teachers would exacerbate this practice. A punitive evaluation system, based on a single test, would place even more stress on teachers to raise student test scores. Although teachers at all levels are feeling this pressure, teachers and students in the elementary schools are especially sensitive to this as they do not specialize in one subject and are responsible for preparing students for high stakes tests in more than one area.

The increased weight that test scores carry can channel time and money away from other areas of importance. In order to maximize the for test score increases, instruction in subjects that are not tested—the arts, foreign languages, physical education—decreases. Even if we accept the premise that the use of student test scores is defensible, how do you evaluate teachers in those subjects where there is no test? Adding more tests is not an answer. Should we reduce, for example, arts education merely to knowing on a multiple choice test who painted Guernica or who composed Tosca, or should students learn to understand and appreciate a variety of forms of artistic expression and find out how to create their own art or music?

Teachers would have even less time to teach material that is not on the test. Test prep would continue to drive instruction, large numbers of students at both ends of the achievement spectrum would be ignored and the curriculum would narrow even more.

Test scores are not accurate enough or reliable enough to justify serious consequences. Consider how the incorrect reporting and scoring of standardized tests, as seen in the 2003 administration of the Math A and Physics Regents exams, as well as other city and state tests over the past six years, affected students and teachers. Diplomas were granted or withheld based on false scores. Most recently, parents and educators raised questions regarding the validity of test items on the 4th grade ELA exam, pointing out that flawed test designs can yield flawed results.

4. In their accountability plans New York City and State must use a variety of indicators. These indicators must recognize sustained growth over time. The city and state should not rely solely on the collection and analysis of standardized test scores or other absolute measures of performance to evaluate schools.

A variety of indicators should form the basis for the evaluation of schools; such a system would especially recognize the continuous success of high performing schools. In order to improve instruction, the state and city should use multiple forms of evidence, such as longitudinal studies of a cohort of students rather than the performance in a grade of a different group of students from year to year.

The emphasis on absolute measures for improvement such as New York State uses to formulate its list of the most improved schools or SURR schools does not take into account sustained achievement at very high levels. Absolute measures that may require increases of a specific number of “points” or other indicator unfairly penalize schools consistently performing in the very high ranges of achievement. An increase of 20 points, for example, to qualify as improved might not be realistic for schools performing in the 90s range. Conversely, the current AYP formula for labeling schools as in need of improvement under NCLB penalizes struggling schools that may show increases in performance based on the indicators in use but fail to meet an absolute benchmark. A school that has a performance target of seven points will receive no credit for six points, no matter the circumstances of the school.

The AFT, along with educators, researchers, test developers and legislators around the country, has called for changes in the AYP formula that give credit for progress towards proficiency. Additionally, schools that show improvement should not lose money and other resources for failing to meet an arbitrary, absolute standard even though they are making progress.

5. New York State Education Department should fully incorporate the use of performance based assessments in its accountability program that will allow students to demonstrate, over time, strengths, skills and knowledge in a variety of ways and that will engage students in real learning.

The selling point for standardized tests is that they are easy to score, analyze and report to the general public. However, when these tests are the only method for judging student achievement they can, as hundreds of participants in our high stakes testing forums noted, have an adverse affect on teaching and learning and drive wrong instructional and evaluative decisions. The reliance upon excessive and expensive standardized tests is designed not to educate but to produce quick and often "dirty" scores that are used to label and punish. Rather than a punitive and narrow view of what constitutes a good school and effective teaching, we should direct our efforts at providing alternative methods of assessment that can yield a richer, more sophisticated, more realistic view of what and how students are learning. These alternatives do exist.

States such as Nebraska, Wyoming, Maine and New Hampshire have incorporated multiple forms of assessment into their NCLB mandated statewide accountability systems in addition to or in place of a single measure of student performance derived from a standardized test. None of these states use standardized tests to make high stakes decisions. In Rhode Island administrators may use no more than 10% of a school's test scores to determine a school's overall quality rating, or a student's promotion, placement or graduation. Rhode Island also includes teacher, parent, student and administrator surveys about the learning environment in each school. It requires financial information about how schools spend their money. A team of evaluators also visits schools to assess progress and provide input into the final evaluation.

6. New York City must continue to support and fund the use of formative assessments in schools and classrooms.

Interim assessments, by definition, are intended to be formative. Formative assessments are what teachers do on a daily basis and teachers use these assessments, based on their professional judgment, to shape, develop, individualize and form instruction before students have moved on to a new topic or skill. When teachers give these formative diagnostic assessments the students, understandably, may not have reached the level of mastery that continued instruction can provide. Formative diagnostic assessments by their very nature are not meant to contribute to student grades or high stakes decisions; they are meant to improve instruction.

The assessments that the DOE is mandating are not designed to accomplish that purpose and they do not provide timely enough data to teachers to guide instruction. Although the department permitted some groups of schools to '"DYO," (design your own) assessments during the current school year, it is now suggesting it will only allow schools in the next school year to choose their interim assessments from an approved list of department vendors. There is rarely, if ever, alignment between packaged interim assessments, packaged test preparation materials and the state or city exams. The purchase of such materials does, however, provide a great deal of profit for test publishers that could be better spent in classrooms. Teachers must be allowed and supported in developing formative assessments tied directly into their day-to-day instruction that they can use to provide feedback to students that students can immediately use to reflect on their strengths, weaknesses and what they have to do to make progress. This kind of assessment is the single way to motivate students to take charge of their own learning.

New York State has already laid the groundwork for this. In the 1990s, at the time the state was reviewing its testing program, the New York State Board of Regents approved the establishment of an Assessment Quality Assurance and Assistance Panel as part of the state's Compact for Learning. This panel was intended to help schools move from a total reliance on standardized tests to one that incorporates other forms of assessment. This panel would also have evaluated the appropriateness of schools' assessment programs and the manner in which they are used and provide guidelines for the appropriate use of assessment data. It is time to revive this dormant proposal.

Instituting an accountability system that includes multiple forms of assessment must be comprehensive and ongoing and no single component of the system can be singled out for high stakes decisions. It will require professional development supported by appropriate resources to assist teachers in the development, instructional uses and evaluation of student portfolios and other performance based approaches to student assessment. Teachers who have used these types of assessments remind us that in order to incorporate portfolios and other approaches to student assessment as part of an overall program of evaluation their content must be aligned with the standards and the curriculum, just as it should be with standardized tests. Ensuring this occurs must be an integral part of any professional development plan.

The professional development must be geared to subject matter and level. Appropriate materials, technology and research must be available to help teachers in designing appropriate assessments. Teachers unfamiliar with such assessments should have the opportunity to work with colleagues on development and administration of performance based assessments in the classroom. This serious and collegial sharing of knowledge and expertise among teachers in and across schools can ensure that performance assessments do not degenerate into rigid, top-down formulaic evaluations of student performance. This kind of professional dialogue, sharing of ideas and constant monitoring of student performance can do more to really improve instruction than any effort to raise scores by teaching students to answer a few more questions correctly.

The use of these assessments may necessitate changes in school scheduling to accommodate portfolio review panels and other activities related to evaluation of these assessments. Ensuring time, resources and a model for professional development that is collegial and helpful (such as that described in the union’s career continuum proposal) can be subjects we address in collective bargaining.

Teachers too must introduce students to the proper development and use of portfolios as a tool to monitor and track their (students’) accomplishments. Educators can standardize the kind of work a student portfolio should contain for a specific purpose, e.g. graduation. The rubrics teachers use for judging the quality of student portfolios should include input from students when appropriate but can be shared across schools to ensure that they illustrate meaningful performance and hold students to high standards.

Just as teachers, no matter what their level of teaching experience may be new to portfolio use, so with students. If portfolios and other performance based assessments are to become an integral part of evaluation and accountability systems, then their use must begin sooner than high school, which is currently the case in many places. Collections of student work and student driven, teacher guided, performance-based assessments, aligned with the standards can begin in the early grades and follow a continuum of increasing familiarization and sophistication for students. Students must recognize that these types of assessments will require working in new and different ways.

Learning is a complex process. Teaching is a complex activity. Assessment should recognize and capture this complexity.

7. New York State must encourage and support schools that wish to enter the Performance Standards Consortium.

A model for the kind of assessment system we described above exists in New York State in the Performance Consortium schools that have received waivers from four of the five Regents exams permitting teachers, on the high school level at least, to use a variety of assessment tools to evaluate student progress and adjust classroom instruction. For example, students may present and defend a portfolio of their work to panels of teachers, students and other educators, similar to the process used to evaluate PhD candidates. Of course all tests and assessment, including portfolios and other alternate assessments must be valid (the results measure what the test is intended to measure and can be interpreted meaningfully) and reliable (the measures are consistent and objective). This requirement is especially important when tests and assessments assume a significant role in accountability systems.

The state has not granted additional waivers or provided information to schools that might be interested in using such an assessment system. They should allow additional schools that wish to enter the consortium to do so.

8. The UFT should ensure that teachers, parents, and the public in general receive accurate, clear information in order to understand the data that test results, attendance and graduation statistics and other sources can generate.

In order to encourage public understanding of and support for a variety of assessments and their place in holding students and schools accountable for achievement, parents and teachers must be knowledgeable about tests and assessments and receive information that will enable them to work effectively with children at home and understand what is happening in their child’s classroom. The DOE formula for computing a single letter grade for a school is neither clear nor transparent and does not accurately communicate real information about schools to parents, teachers and the public.

A single score or letter grade for an entire school is not a good starting point for a discussion. Teacher comments, written and oral, about specific aspects of a student's performance as well as appropriate rubrics can provide other, often more useful, individualized measures.

In order to increase awareness the UFT should develop a series of conferences, forums and seminars as part of a UFT Institute for Assessment and Accountability, similar to the UFT Teacher Center Urban Educator Forums, where parents and teachers can explore and debate the myriad issues related to assessment and accountability, especially those related to the use of multiple measures to evaluate student performance. We need to make “assessment literacy” a focus in professional development activities. Universities must make it a central element in pre-service teacher education.

9. States, schools, teacher unions and other educational organizations around the country should work together and with the federal government to explore uniform approaches to assessment and accountability based on a sampling of students from all grade levels similar to that the National Assessment of Educational Progress (NAEP) uses.

At this point it seems clear to many that while well intended NCLB is a flawed law that is not working to accomplish its good intentions. This is not merely a matter of the failure to fund it adequately. It has set unreasonable expectations and provides only sanctions to enforce those expectations. This punitive model has created a system that focuses on testing and test preparation, not on teaching and learning. It is not closing the achievement gap and the results on state assessments are not matching the results of NAEP or other measures used to audit claims of state performance. It has redefined educational priorities to make test scores in individual schools more important than instruction, planning and curriculum development. NCLB has provided an excuse for the Klein administration to institute excessive and unwarranted testing, and their misuse in every grade of our city's schools to the detriment of students and teachers. A sampling procedure rather than high stakes tests of every student on every grade would put the focus back onto instruction and student learning.

The current belief in and dependence on testing, test preparation and data collection is harming students in this country. The huge amounts of money states and local school districts spend on packaged tests, test prep materials, testing consultants, and the scoring, collecting, analyzing and reporting of results could be much better spent directly in schools and classrooms. The time spent on test preparation and test administration could be better spent on instruction.

Although NCLB dictates that states must have a system to measure AYP the components of this system vary wildly from state to state. New York State's (and most state's) accountability systems rely almost totally on standardized tests of varying quality with some inclusion of attendance and graduation rates as additional indicators. Nebraska, at the other end of the accountability spectrum, allows school districts to use a portfolio of assessments (that may include standardized tests) that are then sent to the state for rating. Nebraska disseminates those local accountability models it considers the best around the state for possible replication.

No matter how good (or bad) an individual state's system may be this patchwork of accountability does not provide a true picture of how students in individual states or the country as a whole are doing. This country's educational system needs a consistent accountability system that will encourage in-depth teaching and curriculum and that should include locally developed, valid and reliable performance based assessments. The creation, field testing and implementation of all new tests and assessments must involve educators. Teachers have knowledge of coursework and student development that make them uniquely qualified to be part of this task. Teacher input can ensure that tests closely match coursework, measure many aspects of student achievement such as critical thinking and skill mastery, and provide useful data that will enhance instruction.

For example, it was only after student results on the spring 2003 administration of the Math A and Physics Regents Exams pointed up the lack of alignment between the coursework and the tests that the New York State Commissioner of Education solicited teacher recommendations for a more valid and reliable August 2003 administration of these exams. This points up the necessity of involving classroom teachers in this process from the beginning and not relying blindly on corporate test manufacturers and pre-packaged testing materials.

An accountability system must include an objective, external and collaborative school review process that assesses effective practices and provides a mechanism for the sharing of these effective practices. Schools, districts and states must do an annual reporting of their progress using a defined and agreed upon set of indicators such as uniformly calculated graduation rates, average daily attendance rates, teacher turnover and demographic information. This information can be supplemented with a sampling procedure using a standardized test such as the NAEP. This will allow for comparisons across schools, districts and states for those who believe this is meaningful and necessary.

In order to stop, not only in New York City but throughout the country, the excessive, duplicative, and expensive use of often flawed standardized tests that are not aligned with what is being taught, we should explore the possibility of using one method of tracking student and school progress from year to year nationwide such as expanding the existing NAEP, now currently used in many states but only on a voluntary basis and only as an addition to their local testing program.


The combined city, state and federal requirements have created a testing culture in our schools that is truly excessive and inappropriately high stakes. When the stakes become as high as they are now, unintended consequences can be the result. UFT members support accountability and meaningful assessment, but we are very concerned about the misuse and over-reliance on high stakes tests to evaluate students, teachers and entire schools. We have given many different groups with many different educational and political beliefs the opportunity to express their opinions on the role that tests and assessments should play in our schools. The individuals on this task force mirror the variety of opinions contained in the education world at large. The recommendations represent a consensus of opinion and provide a basis for future discussion.

Implementing some of these recommendations will not be easy. It will require the wise use of time, money, and resources to create an assessment system that supports both good instruction and accountability. It will require educating, and meeting with representatives of the DOE and members of the New York City Council to demand public explanations of the educational rationale behind decisions that city, state and federal officials make about testing policies and the expenditure of public funds. It will require that we work together with our colleagues in our statewide affiliate, NYSUT and our national affiliate, AFT, to lobby for help to implement these changes.

We must also present these recommendations to the union membership in a variety of forums, meetings and discussion groups so we can use them to drive specific changes in classrooms We anticipate that members from all levels and who work with a variety of students will be able to guide the UFT leadership as to how an assessment and accountability program as recommended in this report can improve and enhance what occurs in their schools around the city.

We must recognize that using tests and assessments as tools for accountability is not the same as using tests and assessments as tools for improving instruction. The former is fraught with the pitfalls that come from the intuitively appealing but wrongheaded over reliance on the scores; the latter can lead to improved student outcomes. In either case tests and assessments cannot stand apart from the other conditions and realities that obtain in our schools and our society. Quality instruction and leadership, small class size, the equitable distribution of resources, student support services, an appropriate and adequate infrastructure, public involvement and up-to-date technology provide conditions for student success that are more important than tests intended to measure those outcomes. The consequences for failing to provide these opportunities will show up in test results but we must not ignore the causes. Closing the achievement gap among the various groups in our schools and guarantying that all students have equal opportunities to learn is a priority. To the extent that tests and assessments can indicate that this is occurring and provide models showing where there is success and warning where there is failure is instructive. But mismeasures based on an over-reliance on flawed testing instruments hurt everyone.

How we measure our schools measures our society’s commitment to public education.

