Student essay grading done by computer – Really?



The latest innovation in the for-profit education industry is the introduction the Robo-Reader, an automated essay scoring program that will take over what dedicated professors – and hungry graduate assistants – formerly had to labor over assiduously. Remember that word, assiduously.

Essay questions were always the bane of the indifferent student. True-false, multiple choice, at least those tests offered the hope, however illusory, that you could cram at the end of term and recognize enough to make a reasonable stab at the answer. Or at least outright guess.

However, the essay test is the great leveler among students. The professor would examine your cramped handwriting – often writ large to create the illusion of writing more – perhaps noting the dried beads perspiration on the paper – and begin dissecting your feeble attempts at cogency and scholarship.

This would lead to the flippant retort quickly adopted by all students when asked, “How’d you do?”

With an insouciant flip of the head and a knowing smile, the erstwhile scholar would reply:

“Oh, I faked it.”

These four words, used universally and without reservation, are code for:

A. I didn’t study.

B. I dazzled the prof with my lexicological legerdemain.

C. I didn’t have a clue what I was writing about.

D. All of the above.

That may soon be a thing of the past with the development of computer programs that will “read” and grade college and high school essays. The programs have been tagged Robo-Readers with the goal of machine-graded writing exams as part of testing to match the Common Core curriculum, the basis for curriculum in 24 states.

With more writing exams expected to roll out for these high school standardized tests (called “high stakes” tests in several articles), grading thousands or hundreds of thousands of essay tests, Robo-Readers may be the only cost-effective method to deal with the load.

The programs can recognize sentence structure, complexity of vocabulary and key subject words. And they know the word count.

They will give good scores for well-organized, grammatically correct, wordy essays.

What they can’t recognize or understand is content. If a student writes the Civil War started in 1812, the software doesn’t “know” that. What do count are bigger words, longer sentences and longer essays.

Mark D. Shermis, dean of the University of Akron School of Education, is at the center of this controversy, saying that the programs are as accurate as human scorers.

Shermis has claimed, “A direct comparison between human graders and software designed to score student essays achieved virtually identical levels of accuracy, with the software in some cases proving to be more reliable.”

That is a sweeping statement that is bound to make education administrators absolutely drool at the prospect. Now essays can be assessed with the same speed and accuracy of the math section of the SAT.

Or can it? The methodology used to support Shermis’ conclusions has been called into question. Obviously much more research should be done. Shermis’ work says statistically the scores are the same as human-scored work.

But they are not scored on the same criteria, and they do not compare individual scorings. So while the numbers may “right,” they don’t indicate how good the individual really is.

We already are faced with the “dumbing down” of education standards in testing. No Child Left Behind has set impossibly high standards that few if any school districts will meet in the next two years. If they can’t be lowered, then they will simply put up ladders to allow students to step over.

We are already faced with teachers forced to teach to the test. Now we would create students who would write to the Robo-Reader.

If content does matter, critical thinking does not matter. Then it becomes a grammatical exercise.

Supporters say students are not that clever. And if they are, their preparation will create greater creativity and thought than the test is designed to measure. Right.

Use longer words, longer sentences. Write “assiduously” rather than “diligently.” Use a phrase like “lexicological legerdemain” rather than “word magic,” and you too can ace the exam.

In other words, “faking it” will work.

View desktop version