Andrich, D. (1978). A Rating Formulation for Ordered Response Categories. Psychometrika, 43(4): 561-573.
Bachman, L. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.
Basturk, R. (2008). Applying The Many-Facet Rasch Model to Evaluate PowerPoint Ppresentation Performance in Higher Education. Assessment & Evaluation in Higher Education, 33(4): 431–444.
Baghaei, P. & N. Amrahi. (2009). Introduction to Rasch Measurement. The Iranian EFL Journal, 5: 139-154.
Barkaoui, K. (2010). Do ESL Essay Raters’ Evaluation Criteria Change with Experience? A Mixed-methods, Cross-Sectional Study. TESOL Quarterly, 44(1): 31-57.
Du, Y. et al. (1996). Differential Facet Functioning Detection in Direct Writing Assessment. Paper presented at the Annual Meeting of the American Educational Research Association, New York.
Eckes, T. (2009). Many-facet Rasch measurement. In S. Takala (Ed.),
Reference Supplement to the Manual for Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, teaching, Assessment (Section H). Strasbourg, France: Council of Europe/Language Policy Division. Retrieved from
http://www.coe.int/t/dg4/linguistic/Source/CEF-refSupp-SectionH.pdf
Englehard, G. (1994). Examining Rater Errors in the Assessment of Written Composition with A Many-Faceted Rasch model. Journal of Educational Measurement, 31(2): 93-112.
Esfandiari, R. & Myford, C. M. (2013). Severity Differences Among Self-Assessors, Peer-Assessors, and Teacher Assessors Rating EFL Essays. Assessing Writing, 18(2): 111-131.
Farrokhi, F. et al. (2012). A Many-Facet Rasch Measurement of Differential Rater Severity/Leniency in Three Types of Assessment. JALT Journal, 34(1): 79-102.
Knoch, U. (2011). Investigating The Effectiveness of Individualized Feedback to Rating Behavior- A Longitudinal Study. Language Testing, 28(2): 179–200.
Linacre, J. M. (1989/1994). Many-Facet Rasch Measurement. Chicago: MESA Press.
Linacre, J. M. (1997).
Judging Plans and Facets (Research Note No. 3). Chicago: University of Chicago, MESA Psychometric Laboratory. Retrieved from
http://www.rasch.org/rn3.htm.
Linacre, J. M. (2002). Optimizing Rating Scale Category Effectiveness. Journal of Applied Measurement, 3(1): 85-106.
Linacre, J. M. (2014). FACETS (Version 3.71.4) [Computer Software]. Chicago, IL: MESA Press.
Lumley, T. (2005). Assessing Second Language Writing: The Rater’s Perspective. Frankfurt am Main: Peter Lang.
Lumley, T. & McNamara, T. F. (1995). Rater Characteristics and Rater Bias: Implications for Training. Language Testing, 12(1): 54–71.
Masters, G. (1982). A Rasch Model For Partial Credit Scoring. Psychometrika, 47(2):149–174.
Matsuno, S. (2009). Self-, Peer-, and Teacher-Assessments in Japanese University EFL Writing Classrooms. Language Testing, 26(1): 75-100.
McNamara, T. F. (1996). Measuring Second Language Performance. New York, NY: Longman.
Myford, C. M. (1991). Judging Acting Ability: Moving from Novice to Expert. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL.
Myford, C. M. & Wolfe, E. W. (2004). Detecting and measuring rater effects using many facet rasch measurement: Part II. In E. V. Smith & R. M. Smith (Eds.), Introduction to Rasch Measurement (pp. 460–517). Maple Grove, MN: JAM Press.
O’Neill, T. K. & Lunz, M. E. (1997). A Method to Compare Rater Severity Across Several Administrations. Paper Presented at The Annual Meeting of The American Educational Research Association, Chicago, IL.
Schaefer, E. (2008). Rater Bias Patterns in An EFL Writing Assessment. Language Testing, 25(4): 465-493.
Scullen, S. E. et al. (2000). Understanding The Latent Structure of Job Performance Ratings. Journal of AppliedPsychology, 85: 956-970.
Smith, R. (2004). Fit analysis in latent trait measurement models. In E. Smith & R. Smith (Eds.), Introduction To Rasch Measurement (pp. 51-83). Maple Grove: JAAM Press.
Wigglesworth, G. (1993). Exploring Bias Analysis As A Tool For Improving Rater Consistency in Assessing oral Interaction. Language Testing, 10(3): 305-335.
Wigglesworth, G. (1994). Patterns of Rater Behaviour in The Assessment of An Oral Interaction Test. Australian Review of Applied Linguistics, 17(2): 77-103.
Winke, P. et al. (2013). Raters’ L2 Background As A Potential Source of Bias in Rating Oral Performance. Language Testing, 30(2): 231-252.
Wright, B. D. & Stone, M. H. (1979). Best TestDesign. Chicago: MESA Press.