It's All About the Data
Yesterday I attended an event at the Brookings Institute on the war in Afghanistan. It was moderated by a news correspondent from ABC and included comments from a senior fellow at the Brookings Institute (who is also one of the authors of Toughing It Out in Afghanistan) and a man we were told is an expert on Pakistan. It was generally pretty interesting, and I think I learned a decent amount about where we're at in that conflict. The author was pretty optimistic about our chances in Afghanistan but worried about Obama's plan to slowly begin drawing down troops in the summer of '11. He said he spoke to many Afghan and Pakistani dignitaries in Dubai last week who warned him that those across the region feel that this is a sign of a lack of commitment on the part of the US. As one might expect, we all left unsure about where Afghanistan will be in the coming years.
It's not the war in Afghanistan that I want to focus on here, however. It's one of the comments that came up during the question and answer session of the event. One of the questions was about metrics and what criteria the military should be measuring in order to get an accurate understanding of whether what they're doing in Afghanistan is working or not. The Brookings fellow talked a little while about metrics and how they can best be used in a military context, and then he made a comment that stuck with me. He said that we have to be very careful that we don't rely too heavily on data that are easily constructed/gathered. He warned that we often put too much of an emphasis on these datum points because we can create them easily, while it is the data that are often more difficult and time-consuming to create that are often more useful in helping us determine whether we're meeting our objectives or not.
I paused for a second, did that silent head-nod thing that let everyone around me know that I agreed with him and, of course, instantly related it to education.
The way that we use data to evaluate progress/learning may be one of the most mismanaged aspects of public education. I'd like to start by discussing standardized test scores.
I think one of the reasons so many people have so many problems with standardized tests is that despite a growing number of common standards, there is still a significant lack of agreement about what exactly a school should teach. Is math really more important than art? Is writing really more important than supply and demand? Should schools explicitly teach social skills? Should all students really be held to the same standard? Are 9th, 11th, and 12th graders expected to be learning too?
Standardized testing data are horribly flawed because the don't tell you how much a student enjoys school (which I think MATTERS). Data don't tell you anything about how musically talented a student is or whether a school has cultivated that talent. And the data don't tell you anything worthwhile about the students who weren't tested, which is about 75% of the population of most high schools. And even for the students and subjects it does test, the results can often be misleading.
When you look at all the things a school does, all the things it should be doing, and all the things it could be evaluated for, standardized tests measure maybe 15% of those things (and even less in schools where multiple grades aren't tested), yet they nevertheless provide the public with about 100% of what they know about a school's effectiveness.
Any rational individual looking at this situation from outside the context of education might naturally be inclined to ask herself: WHAT THE HELL? And she'd be right to do so. The way that we collect and manage data with standardized testing, and the importance we place on it, is ridiculous.
But, as with just about everything education does that seems to lack common sense, the reasons we do it are probably primarily political. I can think of three reasons that standardized testing makes sense, none of which are actually helpful to schools/teachers/students:
- Standardized tests quantify the learning of large numbers of students and compare them to each other in a way that the public thinks they can easily understand (whether they really do or not is a totally different matter.) 90 is better than 50, so school A is doing better than school B. We should try to make school B better.
This saves the public A LOT of time. It allows them to make quick value judgments about schools, whereas if they had to look through a lot of qualitative data regarding student improvement like portfolios, student/teacher/principal testimonials and the like, they would probably get very frustrated and demand something that allowed them to more easily compare schools. Qualitative data can often make it very difficult to decide which of two things is better.
"Did school A do better this year than last year? Did it do better than school B? I don't know. There are so many words I have to read! Just give me some numbers; at least I can tell which one is bigger."
- They provide elected officials and educational leaders (e.g. school boards, superintendents, principals) with a means of demonstrating progress in a way the public can quickly glance at and be satisfied with. This kind of number data are easily manipulated in a way that most of the public either doesn't have the education to understand or the time to research. You can find a perfect example of this in this post over at GFBrandenburg's blog.
- Like the senior fellow at the Brookings Institute warned about, standardized tests are probably our sole measurement of schools because they're the easiest way to collect data. If a student gets a multiple choice question wrong, there's little question about what to do: you take away a point (okay, a lot of scoring schemes may be a little more complicated, but I think the point still applies - it's easier than assigning some sort of quantitative measurement to a sample of student testimonials about whether their school made them love learning.) And although states cough up millions of dollars of taxpayer money to fund for-profit testing agencies, they're still probably cheaper than compiling all the different qualitative measurements one might come up with (although I could be wrong here - thoughts?).
But standardized tests aren't the only example of data mismanagement in education. I saw a ton of it in everyday practice at my most recent school. I'd like to offer some anecdotes as concrete examples of where we go wrong with data collection in practice.
Data-driven instruction: it's one of today's education buzzwords. The idea is that you should gather evidence of your students' knowledge and understanding prior to moving on with your instruction. It seems obvious that this should be best practice. The only problem is that very few educators are trained on how to create valid data, and even fewer know how to use data. When you consider the amount of time it takes to create data sets for an entire class (or five if you were to teach either five sections of different students or five subjects to the same students), the reality is that doing responsible data-driven instruction becomes overwhelmingly impractical.
If I wanted to create a chart demonstrating whether my students attained a certain skill as a result of my classroom, I'd first have to create a diagnostic assessment to see if they had it before they entered and then a second assessment to see if they had it after instruction, each of which could take a few hours. Then I'd have to have confidence that my assessment really measured what I was hoping it would measure. Most teachers who've tried this can tell you that you often figure out that what you were hoping to measure wasn't measured at all by the time you go to grade the tests (one reason for this might be because a student may answer a question incorrectly because they didn't understand your language, not because they didn't attain the skill you were hoping to measure). Additionally, if I wanted a way to compare a student's understanding over time or against other students' abilities, I'd have to come up with a way to quantify the data I received (this is often a major inconvenience because not all data can be quantified in a way that is meaningful). Then I'd have to input the data I created into some sort of chart so I could compare it. At the end of this, I'm questioning the validity of what I've just created and why I just spent 10+ hours on a chart when I could have been creating lesson plans or tutoring students.
Nevertheless, this is the kind of stuff IMPACT expects DC teachers to be doing. That's policy. It sounds good, but it's often impractical. In talking to 20+ teachers at my former school about the evidence they provided to their administrators in the IMPACT evaluations, every single one of them told me that they just made it up to get a good rating. Some even use the same exact chart every time they go in for a conference, year after year.
Even our administrators didn't know how to create or use data. I witnessed multiple situations in which administrators were creating data, pretending that it was valid, and then pretending to analyze the data in a way so that we could act on it. The reality was that they didn't have a clue what they were doing but knew they needed to do it because it was a part of their talking points. It's best practice.
I'll give two concrete examples here. One comes from my former content-area department head/administrator and one comes from my grade-level administrator.
My department head was the kind of guy who would find some new educational thing online that would get him super excited and then expect every teacher under him to use it regardless of whether they liked it or not. The problem was that he never had to try it because he wasn't a teacher, and I don't think he ever had any experience teaching in a high school classroom in our subject area. So he found the following questions online or at some workshop:
- From whose perspective are we viewing?
- What's new and what's old?
- How do we know what we know?
- How are things, events, and people connected to each other over time?
- So what why does it matter?
He asked that we have our students answer these questions for every primary source that we gave them. The problem was that even we didn't understand how some of them should be answered. Then he told us how to score them. He gave us a rubric that allowed a student to achieve a 3, 2, or 1 on each question and then asked us to score each others' students' responses. Most of the responses were given a 1 because, like us, the students didn't know what was being asked of them. Our department head got frustrated and told us we needed to raise the scores, so most of us just faked it.
The problem was that our department head didn't know what he was measuring, didn't know how to teach it to the kids, and certainly didn't know how to communicate it to the teachers. So we all just jumped through the hoops of "data-driven instruction" and wasted countless hours of our lives.
(I will quickly say that I believe the questions can be powerful instructional tools if a teacher is WELL trained in how to use them, which we were not. But as measures of assessment, they're sketchy at best, especially when you're attempting to quantify a students' understanding of a particular standard).
The other example comes from my grade-level administrator, who (unfortunately for her) was responsible for administering DC-BAS and DC-CAS exams to the tenth grade. The DC-BAS is like a practice standardized test that is administered four times a year to measure students' improvement in reading and math. Unfortunately for all of the 10th grade teachers, Discovery Education (the company that creates and scores these exams) did not score the written responses for the DC-BAS assessments. So our administrator occasionally put us to work doing it in our morning meetings. We were given an incredibly bare rubric for assigning a 3, 2, or 1 to each response and no training on what we were looking for. We had counselors grading math tests and world history teachers grading English tests. Because two people had to grade each response and then compare, you'd sometimes see one person give a 1 and another give a 3. They'd stare at each other and say, "Sure, 2 sounds fine." Because none of us had much of an idea as to what we were doing, the end result was that we were making up numbers for the sake of making up numbers. The same scorers wouldn't be scoring the same students the next time around, and you wouldn't be able to do anything actionable with the "results" we were coming up with. But we were at least pretending to do "data-driven" instruction.
You often hear people say that policy is usually ten years behind the research. Well, I think that practice is probably another ten years behind policy. Yes, we all know that data are important, but few of us know how to create data or use data. Also, most educators don't have the time to do solid data analysis of their students. So while we will use data (Johnny got the answer right when I asked, so he probably understands and I can move on), most of us won't be doing anything near what we're currently pretending to do anytime soon. If this is something that policymakers are going to hold onto as something that is a valuable waste of teachers' time, then they're going to have to start requiring teacher preparation programs to include data creation and analysis courses, which will just add to the never-ending (and always a big joke to anybody who's inside of it) list of tasks teachers are responsible for.
It's not that I'm against using data in education. I'm for it. It's just that if we're going to use it, we've got to start using it more intelligently and not for political purposes. And I think, in the here and now, it's the teacher that needs to decide how s/he is going to responsibly use data. S/he's got to decide what's bull coming down from the top for political purposes and what's something that can actually be used. Otherwise, it's all just a bunch of hoops.