July 16, 2003
Apples to apples evaluation of charter schools
The Manhattan Institute Center for Civic Innovation has just released
the Working Paper Apples to
Apples: An Evaluation of Charter Schools Serving General Student
Populations, by Jay P. Greene, Greg Forster, and Marcus A. Winters
(July, 2003). In the report the authors confuse the ability of
schools to improve themselves with their ability to improve their
students, as this Web contribution will explain. It is an elementary
error that completely invalidates the report.
I remark that in February, 2003, the same authors produced a report Testing High
Stakes Tests: Can We Believe the Results of Accountability Tests?.
I wrote a Web
review of that report in which I explained that the authors confused
the predictive power of a high stakes test with its validity as a
measure of student learning. That too was an elementary error that
completely invalidated the report. (In both cases the report's
conclusions are plausible, but that is besides the point.)
The present Apples to Apples report sets out to compare the
performance of charter schools with that of public schools serving
similar populations. (Given the wide range of educational policies in
place in charter schools as well as in public schools I'm not sure
that the question is all that interesting, but let's accept the
question anyway.) In order to compare similar schools, the report
focusses on charter schools that serve a general student population,
and the control group of public schools is formed by taking for each
charter school the nearest public school that also serves a general
The measure of performance is whatever standard statewide test is in
place. Now I remind the reader of
the concept of value-added assessment. See, for example:
Value-added assessment employs, ideally, performance data on
individual pupils over multiple years, and looks at improvements over
time. It is a way to factor out the effects of different student
backgrounds, because these are, one assumes, reflected in their
initial test performance. If one doesn't have data on individual
pupils then one can use data on grades within a school. In that case
the incremental performance that one cares for is that between a
certain grade in one year and the next higher grade the next year, on
the assumption that this involves approximately the same student
Greene et al. could certainly have used such grade-to-grade value
added assessment in their work. However, they did something
different. They look at the overall performance of each school in one
year and compare it to the overall school performance the next year.
The school performance is measured in whatever way the state measures
it: typically some average scale score or a percentile rank within the
state. They do this for each tested subject separately, but not
separately for each grade. They then compare the year-to-year changes
in performance of the charter schools to the year-to-year changes in
performance of the nearest public schools. They find, finally, a
small (in fact, very small) advantage for charter schools on this
measure. In the executive summary they express their observations as
Measuring test score improvements in eleven states over a one-year
period, this study finds that charter schools serving the general
student population outperformed nearby regular public schools on math
tests by 0.08 standard deviations, equivalent to a benefit of 3
percentile points for a student starting at the 50th percentile.
These charter schools also outperformed nearby regular public schools
on reading tests by 0.04 standard deviations, equal to a benefit of 2
percentile points for a student starting at the 50th percentile.
And so, the authors completely confuse a measure of the improvement of
schools with a measure of the improvement of student performance.
Charter schools could be performing wonderfully or they could be
performing dismally relative to public schools in improving student
performance, and it would not be seen on the whole school year to year
test score improvements that are the basis of this report. It
would be seen, of course, in traditional value-added
assessment at the pupil or grade level.
Posted by Bas Braams at July 16, 2003 04:26 PM
Wouldn't the schoolwide measure be pretty much an average of the progress at each grade level? i guess it would be off by one grade, since the oldest students the first year would be gone by the second year, but it would be pretty close and simpler than comparing each grade level. Or am I missing something? Statistics is not my strong suit.
The issue is precisely that the oldest students are gone the next year. Let's say that we are dealing with grade school, K-5, and the annual statewide tests start in grade 3 (a common situation), so our schools are rated on the performance of the kids in grades 3-5. Suppose that the charter schools do spectacularly well. The kids that enter in K are on average at the 20th percentile; by the time they are tested in 3rd grade they are at the 50th percentile, and by the time they are tested in 5th grade they have made it all the way up to the 80th percentile. The charter schools repeat this spectacular success from year to year. On the Greene et al measure they would not be making any progress; they are flat at somewhere like the 65th percentile. If the charter schools are consistently a pathetic failure, again the Greene et al measure won't show it. All that it shows is if charter schools are improving over time relative to public schools. It is a measure of how well charter schools are improving their own performance; not of how well they are improving their students' performance.
I found the report somewhat confusing; I think it could be written more clearly. I think when he writes of standard deviations, for example, he is refering to test scale distributions, and not the standard deviations of his sample of score means. But, he should be publishing the calculations and results of the latter statistical tests.
I don't think it is so bad that he is comparing school mean scores in one year to school mean scores in the next year. He can't get any better data than that, I'm sure. He's making do with what is available. And, they must be grade level scores. I don't think there is such a thing as a mean test score for an entire school with several grades mushed together.
I have more trouble with the assumption that he has comparable groups. The fact that he is comparing a charter school to the nearest public school (if I understood that part correctly) almost assures that they are not comparable, since presumably any kid at either school must have made a conscious, deliberate choice for one or the other, ...lots of selection bias (and the kids in charter schools are probably the ones with more initiative, more adventurous). I think it would be better to compare charter schools, or cities with charter schools, to public schools in other, demographically similar cities that have no charter schools, or with those entire cities ...a sort of paired cluster design.
Well, OK, Bas has implied that I am being much to wimpy about this. Maybe he's right. The authors could have done a value-added study--it would have taken more time, would have been more complicated, and would have given them a somewhat smaller sample size. But, in the end, they would have had a better study. There are enough states that test in more than one grade, some even in consecutive grades, that the authors could have used synthetic cohorts.
Bas is right, I believe, that at best with the current study they are looking at how well the schools improved what they were doing in a year's time, and not how much they improved their students. This, in a way, biases the study in favor of the newer schools (the charter schools) which are higher up on the learning curve. Newer schools are likely to make larger incremental gains than older schools, because they start from a higher point on the learning curve.
Maybe I was being wimpy about it in part because I am astonished by, but clueless about, what is happening with MI education reports. This is the third one I have read in the past year, which amounts to all that I have read in the past year, that has been thoroughly suspicious. I asked a colleague who tends toward the "establishment" side of the fence about this and he said "they have an agenda, you know." Maybe they do; I certainly wouldn't know. Maybe they are so tired of the research fraud of "the other side" that they have become attracted to research without restraint.
The worrying aspect is that dozens of education reform groups around the country accept what they get from MI as gospel, and spread it. They assume, as I used to, that an MI report was dependably valid and reliable.