Tuesday, May 31, 2011

When testing is good

Related to the benefits of flashcards are the benefits of testing. Good tests provide feedback--to teachers and parents as well as to students. Good tests prompt retrieval practice by students, thereby solidifying learning. And, finally, good tests motivate teachers and students to be teaching and learning what they should be teaching and learning about anyway.


Good tests--that's key. When people bemoan testing, they often forget that, amid all the dumbed-down, trivia-focused tests out there, there are also some good ones: ones that don't impose artificially low ceilings and that do measure conceptual understanding and meaningful content knowledge. Much as people would like to believe otherwise, these include standardized, multiple-choice tests like the SATs and the Advanced Placements. Well-designed multiple choice questions, indeed, are often better barometers of understanding and knowledge than open-ended questions are. Open-ended questions--especially long answer questions and essay questions--are hard to grade objectively, and are only as good as their graders are

As my oldest child has just finished taking that SATs, I've had occasion to contemplate how this test has changed since I took it a generation ago. While the addition of an essay, in principle, seems like a good thing, I wonder whether those who spend all their hours grading them can be trusted to make the right judgments.  And is the essay really an improvement over the analogies section that it has replaced? Adam Cohen argues that it isn't--particularly in measuring clarity of thought and expression. Sure, essays can be an extremely effective assessment instrument, especially essays that are clearly written by students without help from others. But instead of reducing essays to one-dimensional scores assigned by word-weary human beings, why not let college admissions staff see the actual essays and assess them themselves?

What about the issue of teaching to the test?  Ideally, this is a good thing. Tests that measure meaningful, teachable knowledge are perfectly reasonable to "teach to." While I wouldn't want my son wasting his time on an SAT-preparation class (the SATs primarily measure analytical and conceptual skills and cognitive science research suggests that these cannot be taught directly), given a choice between a science or history class that teaches to the Advanced Placement test, and one that does not, I'd far prefer the former. For, in this day and age, without a good test looming in the background, who knows what a science or history teacher would choose to focus on?

13 comments:

Jerrid Kruse said...

You're right there are many good tests. Most knowledgeable educators would not disagree. However, when we use these tests to label kids or teachers, we miss the real utility of the tests - to plan for better instruction.

Also, all...let me say that again...ALL tests are subjective. You cannot remove bias. For example, since you like MC tests, let's consider the creation of such a test. While these tests seem objective, since there is a most correct answer, we have to ask several questions.

Correct according to who?
Whose interpretation of the question is this correct answer based upon?
Who chose the language of the question?
Who chose the distractor responses?
Who decided that the content upon which the question is based ought to be on the test?

If you can be honey about the answers to these questions, you'll see that each step of question creation is drought with decisions made by a human being - a subjective, biased human being.

Objectivity is a myth, philosophers of science gave up on it long ago. Your belief in objectivity indicates your desire to simplify very complex issues - a likely reason you've set up this false dichotomy between left & right brain (as you use them) education systems.

Katharine Beals said...

"However, when we use these tests to label kids or teachers,"

Notice that I don't mention labeling as one of the benefits.

"we miss the real utility of the tests - to plan for better instruction."

This is one benefit. I discuss several others as well. (See above).

"Correct according to who?
Whose interpretation of the question is this correct answer based upon?
Who chose the language of the question?
Who chose the distractor responses?
Who decided that the content upon which the question is based ought to be on the test?"

Yes, these questions arise in all tests. That's why the good ones are normed, and reflect skill sets that are relevant to the purpose(s) of the test (e.g., Advanced Placement tests assess preparation for college-level classes).

"a subjective, biased human being"

Trivially true of much that we decide not to worry too much about nonetheless. For example, many of us choose to undergo annual physical examinations by human doctors, and, occasionally, surgical procedures conducted by human surgeons.

And trivially true of much that we do worry about nonetheless. For example, of us are also curious about how we and others do on standardized tests--as if those tests actually correlate with something that isn't totally subjective.

'Objectivity is a myth, philosophers of science gave up on it long ago. "

Not the philosophers of science I know best. Some of them even believe that the speed of light is an objective fact--at least for this local sector of our universe! They also believe in such measures as predictive power, independent motivation, Occam's Razor, and falsifiability.

"Your belief in objectivity indicates your desire to simplify very complex issues"

Actually, I think what's simplifying here is the assumption that because all tests are at least somewhat subjective, one can't say that some tests are harder or easier to grade objectively than others are.

"a likely reason you've set up this false dichotomy between left & right brain (as you use them) education systems."

False dichotomies arise when people only consider two alternatives when, in fact, there are additional ones.

Jerrid Kruse said...

"Notice that I don't mention labeling as one of the benefits."

Notice I didn't say you did. Your POV is not all that concerns me.

"This is one benefit. I discuss several others as well. (See above)."

But you didn't mention it. Sometimes what we leave out is
More telling than what we say.


"Yes, these questions arise in all tests. That's why the good ones are normed, and reflect skill sets that are relevant to the purpose(s) of the test (e.g., Advanced Placement tests assess preparation for college-level classes)."

Really? Ha. I'll add a question then: who decided the measures of these tests are relevant?


"Trivially true of much that we decide not to worry too much about nonetheless. For example, many of us choose to undergo annual physical examinations by human doctors, and, occasionally, surgical procedures conducted by human surgeons.

And trivially true of much that we do worry about nonetheless. For example, of us are also curious about how we and others do on standardized tests--as if those tests actually correlate with something that isn't totally subjective.Notice that I don't mention labeling as one of the benefits.

"we miss the real utility of the tests - to plan for better instruction."

This is one benefit. I discuss several others as well. (See above).

"Correct according to who?
Whose interpretation of the question is this correct answer based upon?
Who chose the language of the question?
Who chose the distractor responses?
Who decided that the content upon which the question is based ought to be on the test?"

Yes, these questions arise in all tests. That's why the good ones are normed, and reflect skill sets that are relevant to the purpose(s) of the test (e.g., Advanced Placement tests assess preparation for college-level classes).

"a subjective, biased human being"

"Trivially true of much that we decide not to worry too much about nonetheless. For example, many of us choose to undergo annual physical examinations by human doctors, and, occasionally, surgical procedures conducted by human surgeons."

Then why are you so concerned with objectivity?

"And trivially true of much that we do worry about nonetheless. For example, of us are also curious about how we and others do on standardized tests--as if those tests actually correlate with something that isn't totally subjective."

This is an issue with human ego - your conflating issues.


"Not the philosophers of science I know best. Some of them even believe that the speed of light is an objective fact--at least for this local sector of our universe! They also believe in such measures as predictive power, independent motivation, Occam's Razor, and falsifiability."

An objective fact relative to the observer - sounds like a strong case FOR subjectivity in all things. Also, this sounds like scientists, not philosophers of science. As for occams razor, etc. These things provide guidance & indication of accuracy, not objectivity. Also, Popper is well known, but not nearly as universally accepted when you talk of more nuanced science ideas. Ie: uncertainty principle.


"Actually, I think what's simplifying here is the assumption that because all tests are at least somewhat subjective, one can't say that some tests are harder or easier to grade objectively than others are."

No, what I'm pointing out is your unquestioned definition & use of objectivity. Your last statement tells me you did not critically consider the questions I raised earlier. Yes, the "right" answer is easier to assess, but "right" is in quotes for a reason. Until you understand why, we cannot dialogue.

"False dichotomies arise when people only consider two alternatives when, in fact, there are additional ones."

Actually a false dichotomy is when there is the illusion that a choice must be made, when a middle ground exists.

Jerrid Kruse said...

Please forgive weird formatting/copied text & typos in last comment. I'm doing this on my phone in a car (not driving).

gasstationwithoutpumps said...

The SAT is not a bad test, though it has gotten somewhat easier since I took it back in the 60s. It is now a fairly low-ceiling test: suitable for average college-bound students, but not for distinguishing among top students.

LexAequitas said...

The essay is indeed scored in a rather cursory fashion -- I remember in SAT teacher training they mentioned each essay being scored in a ludicrously short amount of time.

The advice to do well was similarly ludicrous, though it does eliminate the most visible and common errors. You can't really teach someone to write well in a 6-week class, but the errors below are so common that correcting them pretty much guarantees a 3 on 6-point scale.

1) Obey the margins
2) Use paragraphs, not a block of text.
3) Have an opening paragraph.
4) Have a closing para.
5) If you can, throw in a long vocab word.
6) Use a semicolon or colon (correctly!) somewhere in the essay.

Jerrid, I don't agree with your implication that the "myth" of objectivity is a sensible objection to the argument here. There is far greater potential for bias significantly affecting the results when grading an essay than in many other types of tests.

Katharine Beals said...

"But you didn't mention it. Sometimes what we leave out is
More telling than what we say."

I don't mention "to plan for better instruction"; instead I say
"good tests motivate teachers and students to be teaching and learning what they should be teaching and learning about anyway." Good learning outcomes are better than good plans.

"Sometimes what we leave out is
More telling than what we say."

Indeed. You only mention one reason for tests (the afore-mentioned "plans").

"Then why are you so concerned with objectivity?"

I'm concerned, specifically, with minimizing subjectivity. I want the doctors who examine me, and the surgeons who operate on me, to be as objective as possible in their professional judgments. I want the teachers/graders who grade my kids to be minimally influenced by their subjective preferences for certain kids (or writing styles) over others.

"Really? Ha. I'll add a question then: who decided the measures of these tests are relevant?"

What is ultimately decisive is their predictive power. If the AP becomes less accurate in predicting performance in college-level classes, colleges will stop taking it seriously.

"This is an issue with human ego - your conflating issues."

People's egos don't get worked up about tests that are known to be highly subjective.

"An objective fact relative to the observer - sounds like a strong case FOR subjectivity in all things."

No, the speed of light is *not* relative to the observer. That's what's special about it.

But let's take statements about the speed of light as examples of what I mean by degrees of subjectivity. Consider the two statements:

A. The speed of light is relative to the observer.

B. The speed of light is independent of the oberver.

In some (not very useful) sense, A and B are both subjective--trivially, they are both judgments made by humans. But A is more subjective than B because it doesn't square with what humans have observed, and because the predictions that follow from it aren't as accurate as the predictions that follow from B.

Returning to tests, a test whose "correct answers" include answers like A is is more subjective than a test whose "correct answers" include answers like B.

"As for occams razor, etc. These things provide guidance & indication of accuracy, not objectivity."

Yes, and greater accuracy is correlated with.... *less* subjective bias.

"Also, Popper is well known, but not nearly as universally accepted when you talk of more nuanced science ideas. Ie: uncertainty principle."

The uncertainty principle is one of the principles most misunderstood by non-scientists. I certainly wouldn't presume that I understand it well enough to relate it to falsifiability.

Falsifiability standards, however, are regular cited in critiques of grand cosmological theories, e.g., String Theory and the Anthropic Principle.

"Actually a false dichotomy is when there is the illusion that a choice must be made, when a middle ground exists."

This definition is subsumed under the broader definition I provided above.

As far as middle ground is concerned, if you read this blog carefully, you will find many cases in which that is precisely what I'm advocating.

Jerrid Kruse said...

LexAequitas,

You said, "Jerrid, I don't agree with your implication that the "myth" of objectivity is a sensible objection to the argument here. There is far greater potential for bias significantly affecting the results when grading an essay than in many other types of tests."

Your point is key, but flawed. The grading of essays is often believed to be subjective because decisions must be made. Then, when we grade a multiple choice exam, there is one "right" answer, so since the decision has already been made, we believe this equals more objective. However, notice a decision has already been made. The decisions that get made in a multiple choice test are hidden, they are made before actually having any idea what the student knows, or doesn't know.

I personally would rather make my subjective decisions based on what the student writes than on what I believe the student would right.

To link to science, consider confirmation bias. When scientists enter into observations or investigations with a particular idea in mind (as they typically do), they tend to observations that support the original idea and it takes more evidence than maybe it should to convince them their original idea might be flawed (Read "the Structure of Scientific Revolutions" for more on this).

So, if I give a multiple choice test I've entered into the observation of students' learning (investigation) with my assumptions about their learning leading the way. The assumptions are contained in the language of the questions, the distractors I use, the topics I choose to have questions on, etc. Leading with my assumptions doesn't sound very objective.

Now, and let's really go for broke, consider a portfolio assessment in which students create artifacts (writings, projects, concept maps, etc) to demonstrate their learning. Now, when I look at these portfolios I still have my assumptions, but i'm not leading with them. Instead, I'm starting with the students' work and thinking. This is much like the scientist who starts their investigation by looking at the natural world (instead of starting with a set of ideas). Now, this rarely happens in science - usually only when a scientist notices something he/she was not at all expecting.

So, my point is that both MC and "other" forms of assessment are equally subjective. The difference is whether we use our subjective assumptions to create the test (lead with assumptions), or to grade the test (make assumptions about our interpretation of students' thinking), or some mix of both.

Jerrid Kruse said...

"Good learning outcomes are better than good plans."

Wouldn't good plans be based on good outcomes? If you only have outcomes, you'd expect kids to just "figure stuff out" (Discovery learning). I can't imagine you'd support that. And if you do, there is a wealth of research that does not.


"Indeed. You only mention one reason for tests (the afore-mentioned "plans")."

yes, but i was commenting on a piece, there is little reason for me to be redundant with what you've already said.


"I'm concerned, specifically, with minimizing subjectivity. I want the doctors who examine me, and the surgeons who operate on me, to be as objective as possible in their professional judgments."

Then you should want them to not have gone to medical school. All the things they learn create a framework or lens through which they view the human body. I'd argue I want my doctors to be highly biased toward what I regard as robust medical knowledge. Yet, even this knowledge is subjective (there are all sorts of assumptions and values that impact the science behind medicine). I'm okay with the assumptions that have been made because they've been very effective. But that doesn't mean the assumptions have not been made.


"What is ultimately decisive is their predictive power. If the AP becomes less accurate in predicting performance in college-level classes, colleges will stop taking it seriously."

So if two things correlate, that equals objectivity? That makes no sense. It might mean the correlation makes sense, but it doesn't make it an objective fact. When two (or more) things correlate, we decide how to measure the items (a subjective decision). We decide which items to measure (a subjective decision). Why don't we measure height and compare it to college success? Because it doesn't make sense to us - based on the subjective lens through which we view the world. We have a lot of things that make a lot of sense to us and we have robust frameworks that have been very successful, but that doesn't make them objective. Instead, it is the kinds of assumptions we make that makes ideas successful, not the removal of assumption.


"People's egos don't get worked up about tests that are known to be highly subjective."

Not sure what your point is here, but, yes, they do. Consider subjective ratings of anything (TV, movies, books, etc). People get very worked up about things in which assumptions are clearly being made. Yet, as noted in my response to LexAequinas, there are assumptions everywhere, we just have to find them.


"No, the speed of light is *not* relative to the observer. That's what's special about it."

Yes, it is. This is Einstein's most misunderstood idea. Light always travels 3.0 X 10^8 m/s *relative* to the observer. So, if a car is traveling at light speed, the headlights would still light up the road because the observer (the driver) would see the light go out at light speed, but "objectively" this light would be going double the speed of light. So our observations are relative.

Jerrid Kruse said...

"But A is more subjective than B because it doesn't square with what humans have observed, and because the predictions that follow from it aren't as accurate as the predictions that follow from B."

The fact that humans are trying to "square" with what they already know is exactly what makes all of science subjective in nature. That doesn't mean we can't trust it. The best scientists (ones who have revolutionized a field), are usually very aware of the assumptions made in science, it is by questioning these assumptions, and making new ones that revolutions often happen. So, the subjective nature of science is, in some way, required to move forward in science.

"Returning to tests, a test whose "correct answers" include answers like A is is more subjective than a test whose "correct answers" include answers like B."

This is a logical trap. "B is more right, therefore tests with B are better tests"???


"Yes, and greater accuracy is correlated with.... *less* subjective bias."

No, things with greater accuracy are correlated with greater utility. The ideas are still littered with subjectivity. As I noted before, this is OK, but to deny the subjectivity misrepresents how scientists work, falsely raises science above other ways of knowing, and might inadvertently limit scientific progress.


"Falsifiability standards, however, are regular cited in critiques of grand cosmological theories, e.g., String Theory and the Anthropic Principle."

Yes, and String Theory is gaining much traction - not because it is necessarily becoming more falsifiable, but because it is starting to explain more and more. Falsifiability is a limited test (as are all tests).

Katharine Beals said...

"Wouldn't good plans be based on good outcomes?"

Educational outcomes trump educational plans. Ask any parent or employer.

"Then you should want them to not have gone to medical school." NOT.

"So if two things correlate, that equals objectivity?"

Nope. What I said was that the AP is highly predictive of success in college level classes. A 5 in AP Chemistry strongly predicts which college chemistry class a student should take. His or her height does not. If it did, we could use height as a measure, too.

"People get very worked up about things in which assumptions are clearly being made."

People's *egos* would not be at stake if they thought that SAT scores were highly distored by highly subjective judgments. (They might well still get worked up, but for different reasons).

"Light always travels 3.0 X 10^8 m/s *relative* to the observer."

And independently of the observer.

So the speed of light is independent of the observer. Under no circumstance is it doubled.

"This is a logical trap. "B is more right, therefore tests with B are better tests"??? "

It's not a logical trap: it's a noncontroversial statement. People general prefer tests in which people lose points only for incorrect answers.

"No, things with greater accuracy are correlated with greater utility. The ideas are still littered with subjectivity."

Something can be simultaneously subjective and less subjective than something else.

"falsely raises science above other ways of knowing"

Such as?

CA said...

You are right when you say that tests are accurate barometers of who will do well in college. The SATs are actually excellent predictors of this. So, for college entrance and the like, MC tests may be just fine. They can be good predictors of who has and who doesn't have the skills to succeed in college. They separate out those who put in a lot of effort from those who don't.

I'm an immigrant and I think Multiple Choice tests aren't that great when it comes to testing understanding of material in the classroom. They test memorization skills much more than knowledge and understanding. I went to college in the US and found that I could ace tests, even when I didn't understand the material very well. I couldn't do this on tests I did in school growing up, which required a written answer.

I'll give an example. In Economics, in the US, you might get a questions like Capitalism is
A. The private ownership of the means of production
B. The public ownership of the means of production
C. The seat of government in a nation
D. None of the above

If you simply memorize the phrase capitalism is the private ownership of the means of production, you can get the answer right. The problem is the test taker might not understand what that actually means.

When I went to school, I had to answer questions like:
Contrast Capitalism and Communism or List the advantages and disadvantages of Capitalism.

Questions like this require real knowledge and understanding. You can't answer them effectively just by memorizing a lot of key phrases. You can answer a lot of multiple choice tests without a lot of knowledge or understanding. I earned many A's in college without properly grasping the material. Often it came down to eliminating obvious incorrect answers and guessing between two that were left.

It may be that MC tests can be designed to actually test real knowledge and understanding. But all MC tests, by their very nature,
hand the answers to the student. They aren't being asked to come up with the answer completely on their own. And, I'm sure few teachers or college professors would know how to design a really effective test.

I've seen the problems with my own daughter. Often she can't come up with correct answers on her own. But when given multiple choice options, she always gets the answers right. I feel that MC is not asking her to think or form any understanding on her own.

Understandably, the issue of objectivity in grading does come in. Open tests may not be completely fair grade-wise. But they are much better from an educational point-of-view because students have to know more and understand more to pass them. I would prefer that my kids have to take tests that are less fair but that demand real learning and understanding over tests that may be more fair but ask them to memorize a lot of key phrases that may be meaningless to them.

Katharine Beals said...

I'm all for avoiding multiple choice whenever feasible. (c.f. my GrammarTrainer program, which only uses multiple choice in its "teaching phase")

And essays are great-- so long as we can trust those who grade them to make good judgments. In the case of SAT essays, it's far from clear whether this trust is warranted.