Teachers are turning to essay-grading software to critique student writing, but critics point to serious flaws when you look at the technology
Jeff Pence knows the easiest way for his 7th grade English students to improve their writing would be to do more of it. However with 140 students, it could take him at the least a couple of weeks to grade a batch of the essays.
Therefore the Canton, Ga., middle school teacher uses an online, automated essay-scoring program which allows students to have feedback to their writing before handing in their work.
„It does not tell them how to proceed, however it points out where issues may exist,“ said Mr. Pence, who says the a Pearson WriteToLearn program engages the students almost like a game.
With the technology, he has been able to assign an essay a week and individualize instruction efficiently. „I feel it really is pretty accurate,“ Mr. Pence said. „could it be perfect? No. Nevertheless when I reach that 67th essay, I’m not accurate that is real either. As a team, we are pretty good.“
Because of the push for students to be better writers and meet with the new Common Core State Standards, teachers are looking forward to new tools to greatly help out. Pearson, which is based in London and new york, is regarded as several companies upgrading its technology in this space, also referred to as artificial intelligence, AI, or machine-reading. New assessments to evaluate deeper learning and move beyond multiple-choice email address details are also fueling the need for software to greatly help automate the scoring of open-ended questions.
Critics contend the software does not do much more than count words and so can’t replace human readers, so researchers will work hard to improve the program algorithms and counter the naysayers.
Whilst the technology has been developed primarily by companies in proprietary settings, there’s been a focus that is new improving it through open-source platforms. New players on the market, such since the startup venture LightSide and edX, the enterprise that is nonprofit by Harvard University together with Massachusetts Institute of Technology, are openly sharing their research. Last year, the William and Flora Hewlett Foundation sponsored an competition that is open-source spur innovation in automated writing assessments that attracted commercial vendors and teams of scientists from around the whole world. (The Hewlett Foundation supports coverage of „deeper learning“ issues in Education Week.)
„We are seeing lots of collaboration among competitors and people,“ said Michelle Barrett, the director of research systems and analysis for CTB/McGraw-Hill, which produces the Writing Roadmap to be used in grades 3-12. „this collaboration that is unprecedented encouraging a whole lot of discussion and transparency.“
Mark D. Shermis, an education professor in the University of Akron, in Ohio, who supervised the Hewlett contest, said the meeting of top public and commercial researchers, along view publisher site side input from many different fields, may help boost performance of the technology. The recommendation through the Hewlett trials is the fact that automated software be used as a „second reader“ to monitor the human readers‘ performance or provide extra information about writing, Mr. Shermis said.
„The technology can not try everything, and nobody is claiming it could,“ he said. „But it really is a technology that includes a promising future.“
The first essay-scoring that is automated get back to the early 1970s, but there isn’t much progress made through to the 1990s with the advent associated with Internet while the capacity to store data on hard-disk drives, Mr. Shermis said. More recently, improvements have been made in the technology’s ability to evaluate language, grammar, mechanics, and style; detect plagiarism; and supply quantitative and qualitative feedback.
The computer programs assign grades to writing samples, sometimes on a scale of 1 to 6, in many different areas, from word choice to organization. The merchandise give feedback to simply help students enhance their writing. Others can grade short answers for content. To save money and time, the technology may be used in a variety of ways on formative exercises or summative tests.
The Educational Testing Service first used its e-rater automated-scoring engine for a high-stakes exam in 1999 for the Graduate Management Admission Test, or GMAT, in accordance with David Williamson, a senior research director for assessment innovation for the Princeton, N.J.-based company. Moreover it uses the technology in its Criterion Online Writing Evaluation Service for grades 4-12.
Over time, the capabilities changed substantially, evolving from simple rule-based coding to more sophisticated software systems. And statistical techniques from computational linguists, natural language processing, and machine learning have helped develop better ways of identifying certain patterns written down.
But challenges stay static in picking out a definition that is universal of writing, as well as in training a computer to understand nuances such as for instance „voice.“
In time, with larger sets of information, more experts can identify nuanced aspects of writing and improve the technology, said Mr. Williamson, that is encouraged because of the era that is new of concerning the research.
„It is a hot topic,“ he said. „there are a great number of researchers and academia and industry looking into this, and that’s a very important thing.“
In addition to utilizing the technology to improve writing in the classroom, West Virginia employs automated software for its statewide annual reading language arts assessments for grades 3-11. Their state has worked with CTB/McGraw-Hill to customize its product and train the engine, using lots and lots of papers it offers collected, to score the students‘ writing according to a specific prompt.
„Our company is confident the scoring is very accurate,“ said Sandra Foster, the lead coordinator of assessment and accountability within the West Virginia education office, who acknowledged facing skepticism initially from teachers. But many were won over, she said, after a comparability study indicated that the accuracy of a teacher that is trained the scoring engine performed much better than two trained teachers. Training involved a few hours in just how to measure the writing rubric. Plus, writing scores have gone up since implementing the technology.
Automated essay scoring can also be applied to the ACT Compass exams for community college placement, the brand new Pearson General Educational Development tests for a high school equivalency diploma, and other summative tests. However it has not yet yet been embraced by the College Board for the SAT or even the ACT that is rival college-entrance.
The two consortia delivering the assessments that are new the typical Core State Standards are reviewing machine-grading but have not devoted to it.
Jeffrey Nellhaus, the director of policy, research, and design for the Partnership for Assessment of Readiness for College and Careers, or PARCC, desires to know if the technology would be a fit that is good its assessment, and also the consortium will likely to be conducting a research centered on writing from its first field test to observe how the scoring engine performs.
Likewise, Tony Alpert, the principle officer that is operating the Smarter Balanced Assessment Consortium, said his consortium will assess the technology carefully.
Together with his new company LightSide, in Pittsburgh, owner Elijah Mayfield said his data-driven way of writing that is automated sets itself aside from other products in the marketplace.
„What we want to do is build a method that instead of correcting errors, finds the strongest and weakest chapters of the writing and locations to improve,“ he said. „It is acting more as a revisionist than a textbook.“
The software that is new which is available on an open-source platform, has been piloted this spring in districts in Pennsylvania and New York.
In higher education, edX has just introduced software that is automated grade open-response questions for use by teachers and professors through its free online courses. „One associated with the challenges in the past was that the code and algorithms were not public. They were seen as black magic,“ said company President Anant Argawal, noting the technology is within an stage that is experimental. „With edX, we place the code into open source where you are able to see how it really is done to aid us improve it.“
Still, critics of essay-grading software, such as for example Les Perelman, want academic researchers to have broader usage of vendors‘ products to guage their merit. Now retired, the former director of this MIT Writing Across the Curriculum program has studied a number of the devices and was able to get a score that is high one with an essay of gibberish.
„My principal interest is that it doesn’t work,“ he said. As the technology has many use that is limited grading short answers for content, it relies too much on counting words and reading an essay requires a deeper degree of analysis best done by a person, contended Mr. Perelman.