There is an article in The New York Times on a major revision by the College Board to the SAT exams. (Bill Hagens shared this in a recent email.) The College Board says its current college admission exams do not focus enough on the important academic skills. This appearance of this article is not quite serendipitous, since testing is a hot topic right now. But it inspired me to want to continue the discussion on testing we started at our last book club meeting.

Why do we do testing? Of any sort, e.g., of automotive parts? We want to know if some object or process does what we want. Ron Boothe that said we need to make sure that the test actually tells us what we want to know. I would like to expand on Ron’s point and ask a couple of additional questions of a similarly basic nature.

  1. What do we want to know? I want to know when I am running out of windshield washer fluid in time to refill it before I need to clean off the windshield while driving. College admissions offices want to know if the students they admit will succeed academically at their college. The requirement of predicting academic success can be rephrased in a verifiable form to predicting grade-point average at college. The assumption is that grade-point average indicates degree of academic success. Secondary school policy and decision makers and parents want to know whether the students in the schools are getting a good education. How can we similarly rephrase good education in a verifiable form? Let’s go with the measure of goodness of education being student performance in school. (Performance in life is too ambitious.)  From there policy makers move to focusing on student performance in test situations. Typically such tests provide a number, which makes it easy to do comparisons. However, ease of comparison is not our original goal. The step of defining/rephrasing good education in a verifiable form is, of course, a difficult/daunting task – one that requires the input/agreement of students, parents, teachers, school administrators and policy makers. The good news is that as a nation we are currently engaging in a discussion on this topic.
  2. Do we need a test for a particular object or process? Tests typically involve a test instrument, which is often costly and may itself fail. I was recently worried, when the light on the dash of my car indicating low washer fluid came on and filling the washer tank with fluid didn’t make the light go off. I brought my car to the mechanic. He determined that the warning light was faulty and that it would cost $200 or more to fix it. He recommended not spending the money and instead inspect the washer fluid tank periodically to see if it needs fluid. I decided to follow his advice. It is pretty easy to look in the tank to see if it needs fluid. Perhaps the parallel to testing of school children is obvious, but I will say it anyhow. According to the NYT article cited above the College Board decided their instrument was faulty. They decided to replace it with a new instrument. There is demand for the instrument and money to be made to satisfy that demand. We might still ask, though, whether society needs SATs. Critics of past versions of the SATs say that grades in high school are a better predictor of grades in college than the SATs, concluding that we don’t need the SAT instrument. Looking beyond SATs, do we need a national standards test for primary and secondary education? The answer to this question depends in part on what we are going to use it for. Education reform is the use most mentioned. Do we need education reform? Is the current system broken and/or unable to change itself from within?
  3. Does the test instrument tell us what we want to know? The washer fluid warning light did exactly what I wanted, when it was working. It went on when the fluid was low, but with still enough fluid to get me to the next gas station to buy more, even in a rainstorm. In the case of SATs each student taking the test gets a number score that can be used to rank that student among all other students taking that test. College administrators don’t have to look inside the test to see exactly which questions were asked and how the student answered them. A single number is simple, almost as simple as a washer fluid warning light. Now let’s move on to an instrument that tells us whether our children are getting a good education. Perhaps the instrument should measure the performance of students in the schools (as proposed in paragraph 1), which in turn can be used to determine if a student, a teacher or whole school is underperforming. Clearly we can device a test that produces numerical results and those numbers can be used to establish a ranking among students, among teachers and among schools. But does that ranking capture underperforming? And should it be used to terminate teachers and schools? I believe that it is possible to construct tests to, e.g., score how many spelling or grammatical errors a student makes on an essay exam. But it is not clear how many spelling and grammatical errors should label a teacher or school as underperforming in the topic area of composition. And how should we award points for voice and writing style, which probably should be factored in to the same topic area?
  4. How does one construct a test instrument to meet a requirement? I can imagine Toyota inserted a float in in the washer fluid tank on an arm that trips a switch, when that arm moved to a certain position. Such an instrument is clearly testing what we want to test. If I give a test to my computer literacy students that requires them to repeat back material in the textbook or lecture, it is clear that that test is testing whether the student read the book or attended the lecture and remembered the material. Or if, as in Peter Farnum’s example , the motion of tea leaves in a cup of tea are intended to require the student to apply the concepts presented in class about weather patterns by analogy, it seems clear that this question tests whether the students moved beyond memorization to the logic of the weather process. Now for a negative example. I took an honors calculus course my freshman year in college along with 25 other students who scored well on the advanced placement exam for calculus. The professor lectured the whole term, in fact didn’t really lecture, but wrote proofs of theorems on the board the whole term. The final exam consisted of applications of the theorems. There were no proofs on the exam. He failed the whole honors class. The administration forced him three times to scale the results upward. In the end I received 89 out of 100 for the course. To this day I shake my head over the total meaninglessness of that grade. I have not participated in developing questions for the College Board (I have participated in internal evaluations inside a college). I wonder if their process of deciding which questions to use to test achievement in topics from school stands up under scrutiny. The fact that they ever inserted vocabulary such as “depreciatory” and “membranous” in the SATs suggests to me that their process is not very well thought out (example from the article referred to in the first paragraph). What worries me even more is how national standardized tests are constructed to test underperformance of students, teachers and schools – tests that will impact so many people. Do those who mandate and construct these tests feel the huge moral burden of getting it right – actually measuring underperformance accurately?

I am not opposed to testing. There is an article in The Tacoma News Tribune that describes how tests have been used to flag Stewart Middle School as needing assistance. Stewart is described as a high-poverty school on Pacific Avenue. Now there will be an academic audit of Stewart conducted by outside experts hired by the state. They will recommend how Stewart can be improved. Hopefully they will involve the community, teachers and students in discussions about how best to spend federal grant money to improve the school. I for one want to watch the process of transformation of Stewart Middle School. A test does not have to be perfect if it is simply used to point out where more careful study is needed to determine what if anything needs to be done in a necessarily evolving public school system.

I am opposed to testing that is not well conceived, either because the goals were unclear or the tests were constructed in a poor way and don’t test the goals. I am opposed to using test results in faulty arguments to roundly condemn public schools and public school teachers as a prelude to providing public funds to private institutions (see Diane Ravitch’s Reign of Error). If we are going to test the performance of our educational system, we need to dedicate sufficient resources to developing those tests. And we need to keep the profit motive from overwhelming the public good.

  1. Ron Boothe says:

    Thank you for this thoughtful summary and analysis of our discussion of the role of testing in education at our last Tuesday’s Book Club meeting.

    A much longer article about the revisions to the SAT appeared in the NYT Magazine yesterday:

    It is unfortunate that this article did not appear a week earlier as it is directly relevant to a number of issues we discussed with respect to THE NEW SCHOOL (2014) by Glenn Harlan Reynolds. As we discussed, the Reynolds book highlights several legitimate problems confronting education, but he offers few solutions other than a vague vision that somehow technology and free enterprise will provide the answers.

    This article about the revisions to the SAT gives me some reason for hope that smart people are working on better ways to approach testing. Here is a sample quote from the article:

    “The question for Coleman [president of the College Board as it prepared to make changes to the SAT] was how to create an exam that served as an accurate measure of student achievement and college preparedness and that moved in the direction of the meritocratic goals it was originally intended to accomplish, rather than thwarting them. … their first order of business was to determine what the test should measure.”

    The article points out that the original goal of the SAT was in fact to promote a more meritocratic method for choosing who should be admitted to prestigious universities. Rather than have admissions determined based primarily on legacy (my father and his father, and their fathers before them all went to Harvard, and therefore so shall I!), they should be based on scholastic aptitude. The basic idea was that all applicants, regardless of whether from a wealthy or a poor family, should have an equal opportunity to compete for admission slots.

    The problem with this lofty principle is that in practice it was soon discovered that it was possible to “game the system” and increase one’s score on the test by paying for expensive courses that taught strategies to use while taking the test. Wealthy families could thus still pay to move up towards the front of the line ahead of those from poor families who had similar “scholastic aptitude”. The new test tries to get around that by “creating a transparent test and then providing a free [Khan Academy] website that any student could use — not to learn gimmicks but to get a better grounding and additional practice in the core knowledge that would be tested…”

    I also learned from this article the rationale for why the new SAT will include “application wavers” that allow students with high SAT scores to easily apply to prestigious universities. This comes from some fascinating statistical research that demonstrates students living in wealthy zip codes who score high on the SAT tend to apply to top ranked universities. On the other hand, students with similar high SAT scores living in poor zip codes tend to apply only to nearby colleges. The College Board is now sending out “simple to fill out application wavers to prestigious schools” along with the test score results to students who perform above 1550 on the SAT.

    There is lots more interesting stuff in this article. I encourage everyone to read it.

