Saturday, December 8, 2012

Federal Court Rules That NYS And NYC Boards of Education Discriminated Against Teachers of Color

U.S. District Court: New York City Board of Education Discriminates Against Teachers of Color
Gulino v. The Board of Education of the City of New York and the New York State Education Department is a class action lawsuit filed on behalf of public school teachers of color who are challenging the use of discriminatory tests and licensing rules that have deprived them of equal salaries, pensions, benefits, and seniority. On December 5, 2012, the district court issued a ruling that concluded that the Board of Education’s certification exam was discriminatory and not job related in violation of Title VII, but denied Plaintiffs’ request to proceed as a class under a particular class certification procedure, in obtaining back pay for having been unjustly terminated.

LINK

press@ccrjustice.org
LINK

December 6, 2012, New York – In a long-standing civil rights lawsuit brought by the Center for Constitutional Rights and the firm of DLA Piper against the New York City Board of Education, a federal court found that a teaching certification exam was discriminatory and violated Title VII of the Civil Rights Act. As a result, the Board of Education will be subject to independent monitoring to ensure the current version of the exam does not continue to discriminate against minority teachers.
Said Center for Constitutional Rights Legal Director Baher Azmy, “After 16 years in the courts, the Board of Ed and the New York State Education Department are finally being held accountable for racially discriminating against qualified teachers of color in our city. At a time when high-stakes testing is a growth industry, the Court’s decision is an important reminder that such tests must remain fair.”

In 1993 and again in 1996, the State Education Department and the Board of Education mandated that all State and City teachers – entering and experienced teachers alike – pass written certification tests. Yet unlike new teachers entering the classroom, the teachers in this case had already earned master’s degrees, passed content specialty exams, completed other required course work, and received nothing but satisfactory evaluations in their many years of employment in the city schools. The men and women represented in this case lost their permanent teaching licenses, seniority, retention rights, and in some cases their tenured teaching positions, and had their salaries drastically reduced. The Board kept them teaching the same course load, with the same responsibilities, but without the same benefits.

The lawsuit, Gulino v. Board of Education, was filed in 1996 by the Center for Constitutional Rights on behalf of Elsa Gulino, a Latina public school teacher, and three of her African American colleagues. They charged that the certification test they were required to take (called the Liberal Arts & Sciences Test, or LAST), had a disparate racial impact and that the City and the State failed to demonstrate the test was actually related to relevant job requirements. After a series of appeals by the parties, a federal judge’s ruled yesterday that the test discriminated in violation of Title VII.

Said Joshua Sohn of DLA Piper, “This decision makes clear that the City’s use of the LAST both to deny permanent positions to teacher applicants and to cut salaries and benefits of in-service teachers was unlawful. The court recognized that the LAST provided no reliable information about the qualifications or abilities of the affected teachers.”

The court also ruled that the teachers would need to file a separate action to enforce any monetary damages resulting from the Board’s past use of a discriminatory exam. There will be a status conference in the case on January 10, 2013 to explore other forms of relief.

For more information, visit the Center for Constitutional Rights Gulino case page.

The Center for Constitutional Rights is dedicated to advancing and protecting the rights guaranteed by the United States Constitution and the Universal Declaration of Human Rights. Founded in 1966 by attorneys who represented civil rights movements in the South, CCR is a non-profit legal and educational organization committed to the creative use of law as a positive force for social change.
Gulino v. The Board of Education of the City of New York and the New York State Education Department

Federal Court Docket

Synopsis
Gulino v. The Board of Education of the City of New York and the New York State Education Department is a class action lawsuit filed on behalf of public school teachers of color who are challenging the use of discriminatory tests and licensing rules that have deprived them of equal salaries, pensions, benefits, and seniority.

Status
On December 5, 2012, the district court issued a ruling that concluded that the Board of Education’s certification exam was discriminatory and not job related in violation of Title VII, but denied Plaintiffs’ request to proceed as a class under a particular class certification procedure, in obtaining back pay for having been unjustly terminated. The Court ruled that an independent monitor would be necessary to ensure the certification exams were no longer discriminatory. As for back pay, the ruling does not stand in the way of those harmed by the discriminatory exam from seeking individual damages or certifying the class for damages under a different class action procedure.

Description
Gulino v. The Board of Education of the City of New York and the New York State Education Department is a case CCR filed in November 1996 on behalf of a class of public school teachers of color who are challenging the use of discriminatory tests that have had a detrimental effect on their careers.

With CCR's assistance, the teachers first filed complaints with the U.S. Equal Employment Opportunity Commission (EEOC) in June 1996. They were informed in October 1996 that the EEOC had a tremendous backlog of complaints and that they could initiate a federal suit.

Peter Wilds, Mayling Ralph, and Nia Greene, all Black teachers, and Elsa Gulino, a Latina teacher, charged that when the State Education Department and the New York City Board of Education imposed the requirement that they take and pass the National Teachers Examination (NTE), they lost their permanent teaching licenses, seniority, retention rights, and in some cases their tenured teaching positions, and had their salaries reduced drastically. Yet all were retained, on a per diem basis, in the same teaching positions with the same course load.

All of the teachers in the class have masters degrees, have passed content specialty exams, have completed other required course work and have received only satisfactory evaluations in their years of employment in the city schools (as long as 15 years). They charge that the NTE has never been validated for any use other than to assess entry-level teachers and that the exam has a disproportionate impact on teachers of color.

In the last two years, thousands of teachers have been demoted from their jobs as a result of enforcement of the requirement that they pass the NTE. In addition to the NTE, CCR is challenging the Liberal Arts and Sciences Test (LAST), another certification examination that has never been validated for any professional teaching assessment. This test, too, shows a glaring gap in the passage rate between whites and African Americans and Latinos. Estimates of the number of public school teachers who have suffered demotion, termination, and salary and other benefit losses now range from 8,000 to 15,000.

CCR has been working closely with a support group of teachers, the Committee for a Fair Licensing Procedure, which argues that the Board's reliance on the NTE to terminate the regular licenses of experienced teachers constitutes discrimination because the test has a disparate impact on minorities.
Timeline
In November 1996, the complaint was filed in the U.S. Distrcit Court for the Southern District of New York.

In the early spring of 1997, both the Board of Education and the State Education Department filed motions to dismiss the lawsuit. CCR filed its opposition on behalf of the plaintiffs in March 1997, with the papers fully submitted to the United States District Court for the Southern District of New York by April 1997, but no decision was forthcoming.

In 1999, while discovery was ongoing but slow due to the outstanding motions pending before the court, then-CCR Assistant Legal Director Barbara Olshansky worked closely with teachers who were adversely affected by the two agencies' policies to address their concerns about alternative temporary placement, certification in other states, unemployment insurance, and reinstatement upon passage of the examinations. In addition, CCR worked with terminated teachers to develop a lobbying platform for the institution of more effective, accurate, and less discriminatory evaluation systems in the public schools.

On April 19, 2000, the case was reassigned to another judge, the Honorable Constance Baker Motley, who has a long and distinguished record as a civil rights supporter. Judge Motley scheduled the parties to appear at a hearing to discuss all the motions that remained undecided.

On May 31, 2000, the parties argued the motions to dismiss as well as the plaintiffs' oral motion to compel discovery. After a long and somewhat complicated hearing, Judge Motley - in an unusual and unexpected move - issued a ruling from the bench. The judge found for the plaintiffs on all aspects of the case before her. The motions to dismiss the lawsuit filed by the State Education Department and the City Board of Education, each alleging that they were not the true employers of the teachers and were not responsible for a discriminatory test, were denied. The plaintiffs' request for a discovery schedule was granted. Finally, Judge Motley put in place a complete litigation schedule.

In May and June 2000, CCR became aware of a new practice by the State Education Department that once again appears to affect teachers of color adversely. National Evaluation Systems (NES), the company that developed the new teacher certification examination - the Liberal Arts and Sciences Test - recently began implementing a test score voiding procedure. Under this procedure, teachers that pass the test have their scores voided if they have failed the examination in the past. When NES voids the scores, the State Education Department accepts the company's determination and notifies teachers that their scores have been voided. CCR, representing Mirtha Sebelen, one of the Gulino plaintiffs, appealed the State Education Department's determination. After requesting a handwriting sample from Sebelen for analysis, the department reinstated Sebelen's scores and agreed to expedite her certification.

On July 13, 2001, Judge Motley certified the case as a class action, rejecting government arguments that the class representatives' claims were not sufficiently representative of the proposed classes. Judge Motley also rejected the defendants' claims that class actions were inappropriate to challenge government operations, because the government presumably will adhere to rulings in individual cases. The court found no evidence in this case that the governmental defendants would universally apply a positive ruling in individual cases.

Discovery was finally completed in 2002. In the summer of 2002, defendants moved to dismiss the case on summary judgment, claiming that they are entitled to win based upon their legal theories that they were not the responsible parties and that plaintiffs' statistical analysis was defective. In November 2002, the judge agreed with CCR's position and ruled against the motions to dismiss in every important aspect.

Trial was held in the case from December 11, 2002 through May 1, 2003. In May, June, and July the parties filed their proposed findings of fact and conclusions of law with the Court.

In September 2003, the Court ruled for the defendants. Noting that the test had not been properly validated, the Court found for the defendants on the basis of one portion of the test that appeared to relate to the job of public school teacher.

On October 14, 2003 the plaintiffs filed their Notice of Appeal to the Second Circuit Court of Appeals.

Plaintiffs' brief in support of their appeal was filed on February 6, 2004.

On August 18, 2006, the Court of Appeals ruled in favor of the plaintiffs. The three-judge panel held that the lower court had applied the incorrect legal standard in judging the evidence before it. The case has now been remanded back to the lower court, where the evidence is to be assessed under the proper standard. Sadly the Honorable Judge Motley passed away during the appeals process, and the case is now awaiting the assignment of a new judge.

In 2007, the defendants appealed the Court of Appeals decision to the Supreme Court, filing for a writ of certiorari.

December 8, 2009: The US District Court of the Southern District of NY set up a briefing schedule through April 2010 to determine on any outstanding issues, and to determine if an evidentiary hearing will be necessary. A decision is expected after the briefings are submitted.

Attached Files
12 08 2009 Gulino- Order.pdf
12.05.12 Title VII Order and Class Certification Decision.pdf

Termination of Teachers: VAM Versus SGP

Firing Teachers Based on Bad (VAM) Versus Wrong (SGP) Measures of Effectiveness: Legal Note

Bruce D. Baker

April 3, 2012

School Finance 101 Blog

Legal Issues

Value-Added Assessment

LINK

In the near future my article with Preston Green and Joseph Oluwole on legal concerns regarding the use of Value-added modeling for making high stakes decisions will come out in the BYU Education and Law Journal. In that article, we expand on various arguments I first laid out in this blog post about how use of these noisy and potentially biased metrics is likely to lead to a flood of litigation challenging teacher dismissals.

In short, as I have discussed on numerous occasions on this blog, value-added models attempt to estimate the effect of the individual teacher on growth in measured student outcomes. But, these models tend to produce very imprecise estimates with very large error ranges, jumping around a lot from year to year. Further, individual teacher effectiveness estimates are highly susceptible to even subtle changes to model variables. And failure to address key omitted variables can lead to systemic model biases which may even lead to racially disparate teacher dismissals (see here & for follow up, here) .

Value added modeling as a basis for high stakes decision making is fraught with problems likely to be vetted in the courts. These problems are most likely to come to light in the context of overly rigid state policy requirements requiring that teachers be rated poorly if they receive low scores on the quantitative component of evaluations, and where state policies dictate that teachers must be put on watch and/or de-tenured after two years of bad evaluations (see my post with NYC data on problems with this approach).

Significant effort has been applied toward determining the reliability, validity and usefulness of value-added modeling for inferring school, teacher, principal and teacher preparation institution effectiveness.Just see the program from this recent conference.

As implied above, it is most likely that when cases challenging dismissal based on VAM make it to court, deliberations will center on whether these models are sufficiently reliable or valid for making such judgments – whether teachers are able to understand the basis for which they have been dismissed and whether it is assumed that they have had any control over their fate. Further, there exist questions about how the methods/models may have been manipulated in order to disadvantage certain teachers.

But what about those STUDENT GROWTH PERCENTILES being pitched for similar use in states like New Jersey? While on the one hand the arguments might take a similar approach of questioning the reliability or validity of the method for determining teacher effectiveness (the supposed basis for dismissal), the arguments regarding SGPs might take a much simpler approach. In really simple terms SGPs aren’t even designed to identify the teacher’s effect on student growth. VAMs are designed to do this, but fail.

When VAMs are challenged in court, one must show that they have failed in their intended objective. But it’s much, much easier to explain in court that SGPs make no attempt whatsoever to estimate that portion of student growth that is under the control of, therefore attributable to, the teacher (see here for more explanation of this). As such, it is, on its face, inappropriate to dismiss the teacher on the basis of a low classroom (or teacher) aggregate student growth metric like SGP. Note also that even if integrated into a “multiple measures” evaluation model, if the SGP data becomes the tipping point or significant basis for such decisions, the entire system becomes vulnerable to challenge.*

The authors (& vendor) of SGP, in very recent reply to my original critique of SGPs, noted:

Unfortunately Professor Baker conflates the data (i.e. the measure) with the use. A primary purpose in the development of the Colorado Growth Model (Student Growth Percentiles/SGPs) was to distinguish the measure from the use: To separate the description of student progress (the SGP) from the attribution of responsibility for that progress.

http://www.ednewscolorado.org/2011/09/13/24400-student-growth-percentiles-and-shoe-leather

That is, the authors and purveyors clearly state that SGPs make no ATTRIBUTION OF RESPONSIBILITY for progress to either the teacher or the school. The measure itself – the SGP – is entirely separable from attribution to the teacher (or school) of responsibility for that measure!

As I explain in my response, here, this point is key. It’s all about “attribution” and “inference.” This is not splitting hairs. This is a/the central point! It is my experience from expert testimony that judges are more likely to be philosophers than statisticians (empirical question if someone knows?). Thus quibbling over the meaning of these words is likely to go further than quibbling over the statistical precision and reliability of VAMs. And the quibbling here is relatively straightforward, and far more than mere quibbling I would argue.

A due process standard for teacher dismissal would at the very least require that the measure upon which dismissal was based, where the basis was teaching “ineffectiveness”, was a measure that was intended to INFER a teacher’s effect on student learning growth – a measure which would allow ATTRIBUTION OF [TEACHER] RESPONSIBILITY for that student growth or lack thereof. This is a very straightforward, non-statistical point.**

Put very simply, on its face, SGP is entirely inappropriate as a basis for determining teacher “ineffectiveness” leading to teacher dismissal.*** By contrast, VAM is, on its face appropriate, but in application, fails to provide sufficient protections against wrongful dismissal.

There are important implications for pending state policies and current and future pilot programs regarding teacher evaluation in New Jersey and other SGP states like Colorado. First, regarding legislation, it would be entirely inappropriate and a recipe for disaster to mandate that soon-to-be available SGP data be used in any way tied to high stakes personnel decisions like de-tenuring or dismissal. That is, SGPs should neither be explicitly or implicitly suggested as a basis for determining teacher effectiveness. Second, local school administrators would be wise to consider carefully how they choose to use these measures, if they choose to use them at all.

Notes:

*I have noted on numerous occasions on this blog that in teacher effectiveness rating systems that a) use arbitrary performance categories, slicing decisive arbitrary categories through noisy metrics and b) use a weighted structure of percentages putting all factors alongside one another (rather than sequential application), the quantified metric can easily drive the majority of decisions, even if weighted at a seemingly small share (20% or so). If the quantified metric is the component of the evaluation system that varies most, and if we assume that variation to be “real” (valid), the quantified metric is likely to be 100% of the tipping point in many evaluations, despite being only 20% of the weighting.

A critical flaw with many legislative frameworks for teacher evaluation and district adopted policies is that they place the quantitative metrics along side other measures including observations, in a weighted calculation of teacher effectiveness. It is this parallel treatment of the measures that permits the test driven component to override all other “measures” when it comes to the ultimate determination of teacher effectiveness and in some cases whether the teacher is tenured or dismissed. A simple logical resolution to this problem is to use the quantitative measures as a first step – a noisy pre-screening – in which administrators – perhaps central office human resources – might review the data to determine whether the data are indicating potential problem areas across schools & teachers – knowing full well that these might be false signals due to data error and bias. But, the data used in this way at this step might then guide district administration on where to allocate additional effort in classroom observations in a given year. In this case, the quantified measures might ideally improve the efficiency of time allocation in a comprehensive evaluation model but would not serve as the tipping point for decision making. I suspect however, that even used in this more reasonable way, administrators will realize over time that the initial signals tend not to be particularly useful.

**Indeed, one can also argue that a VAM regression merely describes the relationship between having X teacher, and achieving Y growth, controlling for A, B, C and so on (where A, B, C include various student characteristics, classroom level characteristics and school characteristics). To the extent that one can effectively argue that a VAM model is merely descriptive and also does not provide a basis for valid inference, similar arguments can be made. BUT, in my view, this is still more subtle than the OUTRIGHT FAILURE OF SGP to even consider A, B & C – which are factors clearly outside of teachers’ control over student outcomes.

***A non-trivial point is that if you review the conference program from the AEFP conference I mentioned above, or existing literature on this point, you will find numerous articles and papers critiquing the use of VAM for determining teacher effectiveness. But, there are none critiquing SGP. Is this because it is well understood that SGPs are an iron-clad method overcoming the problems of VAM? Absolutely not. Academics will evaluate and critique anything which claims to have a specific purpose. Scholars have not critiqued the usefulness of SGPs for inferring teacher effectiveness, have not evaluated their reliability or validity for this purpose, BECAUSE SCHOLARS UNDERSTAND FULL WELL THAT THEY ARE NEITHER DESIGNED NOR INTENDED FOR THIS PURPOSE.

This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:

School Finance 101

The views expressed by the blogger are not necessarily those of NEPC.

BLOG AUTHOR

Bruce D. Baker

Bruce D. Baker is a Professor in the Graduate School of Education at Rutgers, The State University of New Jersey, where he teaches courses in school finance policy and district business management. His recent research focuses on state aid allocation policies and practices, with particular...

Pondering Legal Implications of Value-Added Teacher Evaluation

LINK

Posted on June 2, 2010

7 Votes

I’m going out on a limb here. I’m a finance guy. Not a lawyer. But, I do have a reasonable background on school law thanks to colleagues in the field like Mickey Imber at U. of Kansas and my frequent coauthor Preston Green at Penn State. That said, any screw ups in my legal analysis below are my own and not attributable to either Preston or Mickey. In any case, I’ve been wondering about the validity of the claim that some pundits seem to be making that these new teacher evaluation policies are going to make it easier and less expensive to dismiss teachers.

=====

A handful of states have now adopted legislation which mandates that teacher evaluation be linked to student test data. Specifically, legislation adopted in states like Colorado, Louisiana and Kentucky and legislation vetoed in Florida follow a template of requiring that teacher evaluation for pay increase, for retaining tenure and ultimately for dismissal must be based 50% or 51% on student “value-added” or “growth” test scores alone. That is, student test score data could make or break a salary increase decision, but could also make or break a teacher’s ability to retain tenure. Pundits backing these policies often highlight provisions for multi-year data tracking on teachers so that a teacher would not lose tenure status until he/she shows poor student growth for 2 or 3 years running. These provisions are supposed to eliminate the possibility that random error or a “bad crop of students” alone could determine a teacher’s future.

Pundits are taking the position that these new evaluation criteria will make it easier to dismiss teachers and will reduce the costs of dismissing a teacher that result from litigation. Oh, how foolish!

The way I see it, this new crop of state statutes and regulations which include arbitrary use of questionable data, applied in a questionably appropriate way will most likely lead to a flood of litigation like none that has ever been witnessed.

Why would that be? How can a teacher possibly sue the school district for being fired because he/she was a bad teacher? Simply writing into state statute or department regulations that one’s “property interest” to tenure and continued employment must be primarily tied to student test scores does not by any stretch of the legal imagination guarantee that dismissal based on student test scores will stand up to legal challenges – good and legitimate legal challenges.

There are (at least) two very likely legal challenges that will occur once we start to experience our first rounds of teacher dismissal based on student assessment data.

Due Process Challenges

Removing a teacher’s tenure status is denial of a teacher’s property interest and doing so requires “due process.” That’s not an insurmountable barrier, even under typical teacher contracts that don’t require dismissal based on student test scores. Simply declaring that “a teacher will be fired if he/she shows 2 straight years of bad student test scores (growth or value-added)” and then firing a teacher for as much does not mean that the teacher necessarily was provided due process. Under a policy requiring that 51% of the employment decision be based on student value added test scores, a teacher could be wrongly terminated due to:

a) Temporal instability of the value-added measures

http://www.urban.org/UploadedPDF/1001266_stabilityofvalue.pdf

Ooooh…Temporal instability… what’s that supposed to mean? What it means is that teacher value-added ratings, which are averages of individual student gains, tend not to be that stable over time. The same teacher is highly likely to get a totally different value added rating from one year to the next. The above link points to a policy brief which explains that the year to year correlation for a teacher’s value added rating is only about .2 or .3. Further, most of the change or difference in the teacher’s value added rating from one year to the next is unexplainable – not by differences in observed student characteristics, peer characteristics or school characteristics. 87.5% (elementary math) to 70% (8^th grade math) noise! While some statistical corrections and multi-year measures might help, it’s hard to guarantee or even be reasonably sure that a teacher wouldn’t be dismissed simply as a function of unexplainable low performance for 2 or 3 years in a row. That is, simply due to noise, and not the more troublesome issue of how students are clustered across schools, districts and classrooms.

b) Non-random assignment of students

The only fair way to compare teachers’ ability to produce student value-added is to randomly assign all students, statewide to all teachers… and then of course, to have all students live in exactly comparable settings with exactly comparable support structures outside of school, etc., etc. etc. That’s right. We’d have to send all of our teachers and all of our students to a single boarding school location somewhere in the state and make sure, absolutely sure that we randomly assigned students, the same number of students to each and every teacher in the system.

Obviously, that’s not going to happen. Students are not randomly sorted and the fact that they are not has serious consequences for comparing teachers’ ability to produce student value-added. See:http://gsppi.berkeley.edu/faculty/jrothstein/published/rothstein_vam2.pdf

c) Student manipulation of test results

As she travels the nation on her book tour, Diane Ravitch raises another possibility for how a teacher might find him/herself out of a job by no real fault of actual bad teaching. As she puts it, this approach to teacher evaluation puts the teacher’s job directly in the students’ hands. And the students can, if they wish, choose to consciously abuse that responsibility. That is, the students could actually choose to bomb the state assessments to get a teacher fired, whether it’s a good teacher or a bad one. This would most certainly raise due process concerns.

d) A whole bunch of other uncontrollable stuff

A recent National Academies report noted:

“A student’s scores may be affected by many factors other than a teacher — his or her motivation, for example, or the amount of parental support — and value-added techniques have not yet found a good way to account for these other elements.”

http://www8.nationalacademies.org/onpinews/newsitem.aspx?RecordID=1278

This report generally urged caution regarding overemphasis of student value-added test scores in teacher evaluation – especially in high stakes decisions. Surely, if I was an expert witness testifying on behalf of a teacher who had been wrongly dismissed, I’d be pointing out that the National Academies said that using the student assessment data in this way is not a good idea.

Title VII of the Civil Rights Act Challenges

The non-random assignment of students leads to the second likely legal claim that will flood the courts as student testing based teacher dismissals begin – Claims of racially disparate teacher dismissal under Title VII of the Civil Rights Act of 1964. Given that students are not randomly assigned and that poor and minority – specifically black – students are densely clustered in certain schools and districts and that black teachers are much more likely to be working in schools with classrooms of low-income black students, it is highly likely that teacher dismissals will occur in a racially disparate pattern. Black teachers of low-income black students will be several times more likely to be dismissed on the basis of poor value-added test scores. This is especially true where a statewide fixed, rigid requirement is adopted and where a teacher must be de-tenured and/or dismissed if he/she shows value-added below some fixed value-added threshold on state assessments.

So, here’s how this one plays out. For every 1 white teacher dismissed on value-added basis, 10 or more black teachers are dismissed - relative to the overall proportions of black and white teachers. This gives the black teachers the argument that the policy has racially disparate effect. No, it doesn’t end there. A policy doesn’t violate Title VII merely because it has racially disparate effect. That just starts the ball rolling – gets the argument into court.

The state gets to defend itself – by claiming that producing value-added test scores is a legitimate part of a teacher’s job and then explaining how the use of those scores is, in fact neutral with respect to race. It just happens to have the disparate effect. Right? But, as the state would argue, that’s a good thing because it ensures that we can put better teachers in front of these poor minority kids, and get rid of the bad ones.

But, the problem is that the significant body of research on non-random assignment of students and its effect of value added scores indicates that it’s not necessarily differences in the actual effectiveness of black versus white teachers, but that the black teachers are concentrated in the poor black schools and that student clustering and not teacher effectiveness is leading to the disparate rates of teacher dismissal. So they weren’t fired because they were precisely measurably ineffective, they were fired because they had classrooms of poor minority students year after year? At the very least, it is statistically problematic to distill one effect from the other! As a result, it’s statistically problematic to argue that the teacher should be dismissed! There is at least equal likelihood that the teacher is wrongly dismissed as there is that the teacher is rightly dismissed. I suspect a court might be concerned by this.

Reduction in Force

Note that many of these same concerns apply to all of the recent rhetoric over teacher layoffs and the need to base those layoffs on effectiveness rather than seniority. It all sounds good, until you actually try to go into a school district of any size and identify the 100 “least effective” teachers given the current state of data for teacher evaluation. Simply writing into a reduction in force (RIF) policy a requirement of dismissal based on “effectiveness” does not instantly validate the “effectiveness” measures. And even the best “effectiveness” measures, as discussed above, remain really problematic, providing tenured teachers reduced on grounds of ineffectiveness multiple options for legal action.

Additional Concerns

These two legal arguments ignore the fact that school districts and states will have to establish two separate types of contracts for teachers to begin with, since even in the best of statistical cases, only about 1/5 of teachers (those directly responsible for teaching math or reading in grades three through eight) might possibly be evaluated via student test scores (see:http://schoolfinance101.wordpress.com/2009/12/04/pondering-the-usefulness-of-value-added-assessment-of-teachers/)

I’ve written previously about the technical concerns over value-added assessment of teachers and my concern that pundits are seemingly completely ignorant of the statistical issues. I’m also baffled that few others in the current policy discussion seem even remotely aware of just how few teachers might – in the best possible case – be evaluated via student test scores, and the need for separate contracts. But, I am perhaps most perplexed that no-one seems to be acknowledging the massive legal mess likely to ensue when (or if) these poorly conceived policies are put into action.

I’ll save for another day the discussion of just who will be waiting in line to fill those teaching vacancies created by rigid use of test scores for disproportionately dismissing teachers in poor urban schools. Will they, on average, be better or perhaps worse than those displaced before them? Just who will wait in this line to be unfairly judged?

For a related article on the use of certification exams for credentialing teachers, see:

Green, P.C., Sireci, S.G. (2005) Legal and Psychometric Criteria for Evaluating Teacher Certification Tests. Educational Measurement: Issues and Practice. Volume 19 Issue 1, Pages 22 – 31

13 Responses “Pondering Legal Implications of Value-Added Teacher Evaluation” →

Nathan Mielke

June 2, 2010

GREAT post Bruce! This is an aspect of teacher evaluation reform that gets NO press. The more you peel away at some of these ideas, the more they stink!
Justin Bathon

June 2, 2010

I left this response on my own blog as well: http://bit.ly/d2Xwd9

Bruce,

You are exactly on point. Both are legitimate legal problems, with the disparate impact being more of the slam dunk in my opinion. The disparate impact numbers would be off the charts and states would have a very difficult time establishing that it is a neutral policy. You start tying that in with school finance stats and other characteristics like age of school buildings and the picture is going to get very dark, very fast (no pun intended). This would have to be a disparate impact case, though, not a disparate treatment case (http://bit.ly/GpQJp) and since it is disparate impact, a class would likely be formed — i.e. someone would need to invest money on the front end of this case to organize it – thus, the NAACP or some kind of organization like that would probably get involved.

The Due Process argument would be a harder (and much less profile) case, but it could be brought individually … so we might start seeing a whole lot more of these. Your identified problems in the statistics I think are great, but you are one of the best statistical minds in the education field. Your average joe-blow lawyer would have a really tough time making that case. And, as long as these cases stayed at the district court level, that case would have to be made over and over and over again by lawyers in each distinct community within each distinct state. If those cases rose to the level of the Circuit courts or the Supreme Court, that would save lawyers some work, but it would still be a costly case to put together and perhaps not worth it to the teachers. Anyway, I think you are right on with the legal analysis, but I think a lot of things in the US don’t make statistical sense, but the legal system is just not competent enough to always tease that out.

Another legal problem this would create is that if teacher evals were 50 or 51 percent based on test score improvements … it would make it even more difficult legally to get rid of bad teachers whose student test scores happened to go up. You can put a bad teacher in front of an AP class, and those kids are still going to excel on the test. If that bad teacher has a bad personality, treats parents badly, or any other negative qualitative component for which she would otherwise be dismissed or non-renewed, the test score based evaluation just gave that teacher a silver bullet in court. Probably like your law person there at Rutgers, I teach my principals to not give a reason to pre-tenure teachers when RIFing, because if you give a reason, then you have to defend it in court. These polices not only give a reason, but they give a reason that is largely outside of the principal’s control. Even if it winds up that courts still think that 40% negative qualitative evaluation is enough to still RIF or dismiss a teacher, the number of lawsuits is likely to go up dramatically.

Generally, all this is what happens when you start forcing statistics in the legal system – which is not built for that at all. The legal system is a very qualitatively oriented system, making decisions mostly based on evidence obtained through interviews and the like. The jury, even, is a qualitative system that collectively makes a decision based on all the evidence presented. Statistics throw a wrench in all that because people react differently to numbers. They think numbers don’t lie (although, of course, we know that they can and do). That’s why generally, I don’t love policies that seek to make decisions based solely on numbers – these kinds of things are the result.
David B. Cohen

June 2, 2010

Marvelous post! Why do law makers and policy makers ignore these concerns? I’ve tried raising the same issues myself in Teacher Magazine and on my group blog,
http://accomplishedcaliforniateachers.wordpress.com

Those who persist in making the argument are engaging in the worst kind of wishful thinking. Test scores reflect teaching… it sounds so simple, logical, and appealing, they figure if they just keep repeating it they’ll win the argument. Then there’s the other popular tacitc, accusing others of embracing the status quo and evading accountability.
Scott Bauries

June 2, 2010

Bruce,

Fascinating stuff! I added my two cents as a former employment lawyer over on Edjurist.

–Scott Bauries
NancyEH

June 3, 2010

Excellent post; every federal and state lawmaker, federal and state education official, along with superintendents, school boards and anyone else in the least bit involved with public education should be handed/emailed/faxed/tweeted/whatevered a copy and strongly encouraged to read it.

Although it is most likely possible to evaluate regular classroom teachers (although it’s much harder to see for special ed, phys ed, librarians, art teachers, Title 1 teachers, Literacy Specialists, Math Interventionists, etc) at least in part on student achievement, the reality at the moment is that there is no consensus as to which tests should be used or how. In defending a teacher who was being fired, I used NeCAP test results to show success; the school system used NWEA to demonstrate relative failure. We were both right and both wrong; should the teacher have lost the job based on conflicting data? No. But it happened and the teacher chose not to fight, thereby saving the school system significant legal and other fees. They may not be so lucky the next time.
Rich Haglund

June 4, 2010

Of course there are currently many faculty teaching classes that do not currently have standardized tests for their courses. The committee in TN charged with developing guidelines and criteria for the annual evaluation of all teachers and
principals employed by LEAs is, among other ideas, considering inviting the teachers associations of those different areas to suggest ways to measure value added growth. E.g., the band directors developing a way to measure student progress over the course of a year.

Another idea being considered is breaking down how the band or art students did on their math tests compared to non-band or non-art students (to potentially bolster the arguments in support of those currently non-core courses), and rewarding the band or art teachers, in part, based on how their students performed on those tests.

Of course, if we had universal vouchers, students and parents could empty the classrooms of incompetent teachers and LEAs would simply have to conduct RIFs.

In the meantime, I think the difficulty of the task and the potential costs of evaluating teachers just as coaches and fans evaluate professional athletes (based mostly on performance–though, I’ll admit, my evaluation of Barry Bonds was based mostly on his apparently poor attitude) should not keep us from the attempt. Our kids are worth it.
Elizabeth

June 4, 2010

Great Posting – and something that teachers seem to already know, but everyone outside of the “inside” can’t seem to get their heads around. I think it would be important, however, to point out that it’s not necessarily a problem with value-added assessments and modeling, per se, but rather the implications (legal and otherwise) of placing reliance on the results of these assessments in areas they do not provide valuable information. In terms of tracking student growth, they can be useful. And if appropriately analyzed, can provide valuable insight into the “effectiveness” of particular teachers or programs. It is the reliance on these inferences for employment decisions that is really the problem here, not their use altogether. I wish I had a solution, but obviously don’t.
Eli

June 11, 2010

Great post – yet, like others have mentioned, frustrating in that to anyone who’s spent a modicum of time looking at the issue should see how fraught with problems this is. But I think the cognitive bias of “wishful thinking” comes in strongly here. They want the achievement gap erased, and yet don’t want to do the heavy lifting of the paradigm shift in resource allocation any true solutions might entail.
Cognitive dissonance + easy answers = willful ignorance.

My worry is that going down this road does 2 things: 1) it takes us further away from real solutions and 2) by rewarding performance-via-standardized testing it further incentivizes the teaching market towards the easiest to teach – which are the *least* needy children to begin with.
Diane Ravitch

June 18, 2010

Bruce, Thank you for this excellent analysis. These policies produce perverse consequences from an educational point of view. The more that basic skills tests count and the higher the stakes attached to them, the more they incentivize cheating, gaming the system, narrowing the curriculum, and teaching to the tests. To get value-added growth models, the amount of testing will have to double, so that students are tested in September and again in May or June. More time for testing and “interim assessments,” less time for instruction. Thus, the states that adopt these policies (hoping to win Race to the Top funding) will see less time devoted to the teaching of history, the arts, geography, foreign languages, science, and other subjects that “don’t count.” These policies may or may not lift test scores. Most assuredly, they will not produce good education. Diane Ravitch

schoolfinance101

June 18, 2010

I could not agree more. One of my primary concerns all along has been not only the curricular narrowing, but the fact that the curricular narrowing is disparately distributed because accountability pressures are disparately distributed. Only some schools, serving some children are forced to narrow their curriculum substantially.

Aside from the teacher evaluation issue, a handful of self-proclaimed school finance experts have begun to argue that schools serving poor and minority children should be forced to re-allocate resources – any and all resources – toward improving test scores. In their eyes (Marguerite Roza in particular), kids in low performing high poverty schools should not be wasting their time – and schools not wasting their money – on trivial stuff like ceramics or cheerleading (her examples, not mine). That other children in nearby affluent districts not facing accountability pressures have these resources is of no consequence in Roza’s view. The reality is that it goes much deeper than cheerleading and ceramics, and into the breadth of advanced foreign language offerings and math/literature and other social science electives at the high school level. Unfortunately, think tanks like Center for American Progress and Ed Trust have bought this garbage wholesale… likely because it provides them the politically convenient argument the poor urban schools can be fixed without any new money (Just like their view on the teacher quality/evaluation stuff). It also allows them to point the finger at district leaders rather than state officials for the way schools are funded.

But I digress.

Thanks for your comments. We had a fun continued legal discussion on this topic over at http://www.edjurist.com.

NYC Rubber Room Reporter and ATR CONNECT

Saturday, December 8, 2012

Federal Court Rules That NYS And NYC Boards of Education Discriminated Against Teachers of Color

Termination of Teachers: VAM Versus SGP

Firing Teachers Based on Bad (VAM) Versus Wrong (SGP) Measures of Effectiveness: Legal Note

BLOG AUTHOR

Bruce D. Baker

Pondering Legal Implications of Value-Added Teacher Evaluation

LINK

Get new posts by email:

Site Meter