Brown, J. D., & Hudson, T. (1998).The alternatives in language assessment. TESOL Quarterly, 32(4), 653-675.
Douglas, D. (2010). Understanding language testing. Oxon: Hodder Education.
Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt, Germany: Peter Lang.
Engelhard, G., Jr. (2002). Monitoring raters in performance assessments. In G. Tindal & T. Haladyna (Eds.), Large-scale assessment programs for all students: Development, implementation, and analysis (pp. 261-287). Mahway, NJ: Lawrence Erlbaum Associates.
Esfandiari, R., & Myford, C. M. (2013). Severity differences among self-assessors, peer-assessors, and teacher assessors rating EFL essays. Assessing Writing, 18(2), 111-131.
Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press.
Knoch, U., Read, J., von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26-43.
Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28(4), 543–560.
Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA Press.
Linacre, J. M., Engelhard, G., Tatum, D. S., & Myford, C. M. (1994). Measurement with judges: Many-faceted conjoint measurement. International Journal of Educational Research, 21(6), 569–577.
Lumely, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: implications for training. Language Testing, 12(1), 54-71.
Lunz, M. E., & Linacre, J. M. (1998). Measurement designs using multifacet Rasch modeling. In G. A. Marcoulides (Ed.), Modern methods for business research (pp. 47–77). Mahwah, NJ: Erlbaum.
McNamara, T. F. (1996). Measuring second language performance. New York: Longman.
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.
Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6), 956-970.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press