Critical Evaluation of Two Articles that Describe the Inconsistencies in Rater Behaviors

Jun 24th, 2018
Rating an essay is considered a complicated task for raters to remain consistent in their decision making process (DMP) when rating. It is because different raters have different qualities which affect raters’ behavior when assessing writing. Several researchers describe these qualities as raters’ proficiency level, raters’ experiences and tasks as the cause of raters’ inconsistencies in raters’ performance in their studies.Other researchers in recent studies suggest that rating process and rater cognitive process are the main cause of these variabilities.Since the level of variabilities determines both the score validity and the rater’s reliability, it is necessary to identify other causes of rater inconsistencies in the DMP.

The purpose
While Barkaoui defines DMP as raters behavior related to aspects of writing, Baker applies it to a cognitive process in the ratings.

In terms of the methodologies of the two studies, I consider the exploratory design of Baker studies which differentiates itself as a study that strongly emphasizes rater behaviors a more interesting one. According to its conceptual frameworks, Baker carefully added a GDMSI questionnaire and a ‘deferred double score’,an observational quantitative data, to effectively reduce the limitations of its controversial self-report scales in the qualitative protocols presenting a more reliable design for data analysis (Baker, 2012. p.229). Unlike Baker, Barkaoui limits the think aloud protocol as a tool to collect his data. This underestimates interferences of the think aloud protocol in the rating process(Barkaoui, 2010, p.57).The small number of participants may restrict the author from applying other counterbalance methods in such case.

Judging from the data collection perspective, Barkaoui’s (2010) study fails to consider the differences of natives and non-native speakers, qualities that may distort the study results as an external factor (p. 57). It is in doubt that such differences have no impact on the rating scores collected while other studies included them as an influential cause in rating performance and inconsistencies. Regarding its data presentation, Baker (2010) demonstrates more distinctive details and