Achieving high standards is the essence of accountability. To measure how well schools and students are meeting high standards, states have developed new assessment systems, or are refining existing systems, to align with the standards. Student scores on these assessments have become the number one indicator of district, school, teacher, and student achievement.
Most states use a mix of tools to measure student performance, including norm-referenced tests, criterion-referenced tests, and performance assessments. They seek a balance, combining open-ended formats that ask students to "invent" solutions to problems with more traditional standardized, norm-referenced tests. At one time, many states dropped multiple-choice test items in favor of performance and portfolio assessments. Testing experts believe these assessments provide a more accurate picture of what students know and can do, but questions about the reliability of such tests have since caused some states, including Kentucky a leader in education accountability, to reintroduce multiple choice (Whitford & Jones, 1999). Others put some multiple choice back into assessments to reduce cost and time requirements.
According to Quality Counts, 48 states are administering statewide testing programs, and 37 said they incorporate "performance tasks" in their assessments. Among the 48, 41 have aligned their tests in at least one subject to standards. Quality Counts reports that 21 states have aligned their standards and tests in all four primary academic subjects.
In addition to using student test scores, states often gauge how well schools and districts are doing by looking at factors such as attendance and dropout rates. In Louisiana, for example, high school students’ scores on the state test account for 60 percent of a school’s score and scores on the national test account for 30 percent. The remaining 10 percent is determined by a school’s student attendance and dropout rates.
Controversy over High Stakes
No one disputes that testing has a place in state accountability systems. Yet, the nature of these tests has steeped them in controversy. Many tests are "high-stakes," meaning that they have significant consequences for students and schools that do not meet achievement expectations. For example, "high stakes" come into play when students are denied promotion or high school graduation because of their low performance on tests, or when a school is totally reorganized because of recurrent low test scores.
Advocates of testing insist that the objectivity of test results ends the uncertainty about what students know and don’t know. Local districts and schools can use test results to identify their instructional strengths and weaknesses, and make decisions about their instructional programs. Critics warn that high-stakes tests can distort and narrow the purpose of schooling to the quest for test scores (WestEd, 2000). High-stakes tests encourage teachers to focus solely on what is tested, obscure richer ways of judging schools, and place blame for ineffective teaching on students. Rather than using test scores to judge students and schools, some assessment experts recommend using test scores as one among many sources of information to answer the same questions about students and schools. They also argue that testing instruments and technology are not up to the demands that high-stakes accountability places upon them (Linn, 2000).
Even as high-stakes testing becomes integrated into the system, many parents, civil rights activists, and educators are questioning the wisdom of relying on test scores for such decisions as student promotion and high school graduation. Parents in one of Michigan’s most affluent school districts recently rebelled against a new high school proficiency test that they claimed did nothing but embarrass students bound for college. Arguing against the inflated value of one test and the loss of local control, they organized student boycotts, political lobbying, and lawsuits to resist the test. Such tensions are making policymakers listen, and sometimes change their plans.
Tough Decisions for Policymakers
Putting state assessment programs into place is filled with tough decisions, each one creating its own tensions for policymakers. Decisions have to be weighed carefully, both to ascertain their educational value and to gain public support around issues of the design and appropriate use of tests.
Most state decision makers have learned that no single measurement instrument can do all things well. Tests designed to hold schools publicly accountable for student achievement are not the same tests that identify weaknesses or guide instruction; neither can they be used to set improvement targets for schools and districts. States have come to understand that their use of a test must match the purpose for which it was designed. Consequently, they’ve had to decide what they want their assessment programs to do, and develop–or select– a range of assessment strategies accordingly. States have had to decide whether to develop their own assessments, designed specifically to address their own standards, or to rely on commercial assessments. Developing new tests that are aligned with standards is a major expense for states. States often have few resources available for this development. Purchasing tests may cost considerably less. However, while test publishers do try to align their test items with common elements in state standards, these tests are unlikely to align as closely as items in a test developed by the state.
States also have had to decide what to compare and how often to test students. Comparing one year’s fourth-graders against another’s may not provide a true picture of achievement because the test population is not the same. This was one of the most significant controversies in the implementation of Kentucky’s accountability system. State officials responded by spreading testing to more grades (Whitford & Jones, 2000). Some experts recommend annual testing at each grade level, arguing that annual testing localizes student performance to the most natural unit of accountability, the grade level or classroom. It also yields the most up-to-date information and limits the amount of data that is lost when students move to other schools and districts. While measuring individual student progress each year offers a more accurate assessment, this method is expensive and difficult to carry out among highly mobile student populations.
At the same time, states have had to decide whether to measure absolute performance or growth in performance. Some states, like Arkansas, recognize schools for both absolute levels of achievement and for growth. Louisiana schools, on the other hand, are given a growth target to reach within two years. In making these decisions, states have had to decide what is an acceptable level of performance and what constitutes satisfactory progress. Other questions to address include the following: Should the same rate of progress be expected all the time? How much growth is reasonable to expect? Should the same amount of growth be expected from schools that start at different achievement levels?
Finally, states have had to face the particularly prickly issue of whether to control for differences in student, family, and community characteristics across students. Some districts believe that controlling for differences in prior achievement and student, family, and community characteristics across schools "institutionalizes low expectations for poor, minority, low-achieving students" (Elmore, Abelmann, & Fuhrman, 1996). Others argue that using data on these characteristics
effectively would require collecting them for all students, increasing the data burden for districts, something only the largest districts may be prepared to handle. Most others generally have on hand only the limited administrative data that is available on students’ race, gender, eligibility for free or reduced-price lunches, special education, or limited English proficient (LEP) status.
Next Page: Public Reporting