Randomness among Teacher Raters Rating Language Learners’ Essays: A FACETS Approach

نوع مقاله : مقاله پژوهشی


Assistant Professor, Imam Khomeini International University


عنوان مقاله [English]


نویسنده [English]

  • --- ---
چکیده [English]

Inconsistency of the ratings of raters may invalidate test results, adversely affecting the decisions made about the placement of language learners to a higher level of education. In the present study, the researcher used the many-facet Rasch measurement model to examine how consistently teacher raters rated the essays written by language learners in their writing classes at Imam Khomeini International University. The teacher raters each rated 56 essays, using a researcher-made, 5-point analytic rating scale. Using FACETS, the Rasch-based computer programme for rating data, the researcher analysed the data. The results of FACETS analysis, including separation indices and fit values, showed that teacher raters were self-consistent in rating the essays language learners wrote. The results of single rater-rest of the raters revealed that each teacher rater’s ratings were consistent with those of other raters. These findings may carry implications for research and pedagogy, shedding light on rater training.

کلیدواژه‌ها [English]

  • randomness
  • rating
  • rating scale
  • self-consistency
Brown, J. D., & Hudson, T. (1998).The alternatives in language assessment. TESOL Quarterly, 32(4), 653-675.
Douglas, D. (2010). Understanding language testing. Oxon: Hodder Education.
Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt, Germany: Peter Lang.
Engelhard, G., Jr. (2002). Monitoring raters in performance assessments. In G. Tindal & T. Haladyna (Eds.), Large-scale assessment programs for all students: Development, implementation, and analysis (pp. 261-287). Mahway, NJ: Lawrence Erlbaum Associates.
Esfandiari, R., & Myford, C. M. (2013). Severity differences among self-assessors, peer-assessors, and teacher assessors rating EFL essays. Assessing Writing, 18(2), 111-131.
Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press.
Knoch, U., Read, J., von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26-43.
Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28(4), 543–560.
Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA Press.
Linacre, J. M., Engelhard, G., Tatum, D. S., & Myford, C. M. (1994). Measurement with judges: Many-faceted conjoint measurement. International Journal of Educational Research, 21(6), 569–577.
Lumely, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: implications for training. Language Testing, 12(1), 54-71.
Lunz, M. E., & Linacre, J. M. (1998). Measurement designs using multifacet Rasch modeling. In G. A. Marcoulides (Ed.), Modern methods for business research (pp. 47–77). Mahwah, NJ: Erlbaum.
McNamara, T. F. (1996). Measuring second language performance. New York: Longman.
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.
Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6), 956-970.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press