A Many-Facet Rasch Measurement of Bias among Farsi-Native Speaking Raters toward Essays Written by Non-Native Speakers of Farsi

Document Type : Research Paper

Author

Abstract

In this study, we investigated the errors native speakers of Farsi commit when rating the ESL essays, using a many-facet Rasch measurement approach. To that end, two native speakers of Farsi rated 56 essays which 28 male and female advanced language learners learning Farsi as a second language at Persian Language Centre at Imam Khomeini International University in Qazvin, Iran wrote on two topics in winter 2014. The raters rated the language learners' essays, using a researcher-made, 5-point analytic rating scale. We analyzed the collected data, using FACETS. The results of FACETS analyses showed that raters showed bias towards both topics. The results also revealed raters’s bias towards only coherence. Statistically significant differences were found in raters’ bias towards essays. The findings suggest that raters should be provided with strict guidelines to rate accurately and consistently. The findings can shed light on rater training to reduce the errors raters may commit.

Keywords


Andrich, D. (1978). A Rating Formulation for Ordered Response Categories. Psychometrika, 43(4): 561-573.
Bachman, L. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.
Basturk, R. (2008). Applying The Many-Facet Rasch Model to Evaluate PowerPoint Ppresentation Performance in Higher Education. Assessment & Evaluation in Higher Education, 33(4): 431–444.
Baghaei, P. & N. Amrahi. (2009). Introduction to Rasch Measurement. The Iranian EFL Journal, 5: 139-154.
Barkaoui, K. (2010). Do ESL Essay Raters’ Evaluation Criteria Change with Experience? A Mixed-methods, Cross-Sectional Study. TESOL Quarterly, 44(1): 31-57.
Du, Y. et al. (1996). Differential Facet Functioning Detection in Direct Writing Assessment. Paper presented at the Annual Meeting of the American Educational Research Association, New York.
Eckes, T. (2009). Many-facet Rasch measurement. In S. Takala (Ed.), Reference Supplement to the Manual for Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, teaching, Assessment (Section H). Strasbourg, France: Council of Europe/Language Policy Division. Retrieved from http://www.coe.int/t/dg4/linguistic/Source/CEF-refSupp-SectionH.pdf
Englehard, G. (1994). Examining Rater Errors in the Assessment of Written Composition with A Many-Faceted Rasch model. Journal of Educational Measurement, 31(2): 93-112.
Esfandiari, R. & Myford, C. M. (2013). Severity Differences Among Self-Assessors, Peer-Assessors, and Teacher Assessors Rating EFL Essays. Assessing Writing, 18(2): 111-131.
Farrokhi, F. et al. (2012). A Many-Facet Rasch Measurement of Differential Rater Severity/Leniency in Three Types of Assessment. JALT Journal, 34(1): 79-102.
Knoch, U. (2011). Investigating The Effectiveness of Individualized Feedback to Rating Behavior- A Longitudinal Study. Language Testing, 28(2): 179–200.
Linacre, J. M. (1989/1994). Many-Facet Rasch Measurement. Chicago: MESA Press.
Linacre, J. M. (1997). Judging Plans and Facets (Research Note No. 3). Chicago: University of Chicago, MESA Psychometric Laboratory. Retrieved from http://www.rasch.org/rn3.htm.
Linacre, J. M. (2002). Optimizing Rating Scale Category Effectiveness. Journal of Applied Measurement, 3(1): 85-106.
Linacre, J. M. (2014). FACETS (Version 3.71.4) [Computer Software]. Chicago, IL: MESA Press.
Lumley, T. (2005). Assessing Second Language Writing: The Rater’s Perspective. Frankfurt am Main: Peter Lang.
Lumley, T. & McNamara, T. F. (1995). Rater Characteristics and Rater Bias: Implications for Training. Language Testing, 12(1): 54–71.
Masters, G. (1982). A Rasch Model For Partial Credit Scoring. Psychometrika, 47(2):149–174.
Matsuno, S. (2009). Self-, Peer-, and Teacher-Assessments in Japanese University EFL Writing Classrooms. Language Testing, 26(1): 75-100.
McNamara, T. F. (1996). Measuring Second Language Performance. New York, NY: Longman.
Myford, C. M. (1991). Judging Acting Ability: Moving from Novice to Expert. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL.
Myford, C. M. & Wolfe, E. W. (2004). Detecting and measuring rater effects using many facet rasch measurement: Part II. In E. V. Smith & R. M. Smith (Eds.), Introduction to Rasch Measurement (pp. 460–517). Maple Grove, MN: JAM Press.
O’Neill, T. K. & Lunz, M. E. (1997). A Method to Compare Rater Severity Across Several Administrations. Paper Presented at The Annual Meeting of The American Educational Research Association, Chicago, IL.
Schaefer, E. (2008). Rater Bias Patterns in An EFL Writing Assessment. Language Testing, 25(4): 465-493.
Scullen, S. E. et al. (2000). Understanding The Latent Structure of Job Performance Ratings. Journal of AppliedPsychology, 85: 956-970.
Smith, R. (2004). Fit analysis in latent trait measurement models. In E. Smith & R. Smith (Eds.), Introduction To Rasch Measurement (pp. 51-83). Maple Grove: JAAM Press.
Wigglesworth, G. (1993). Exploring Bias Analysis As A Tool For Improving Rater Consistency in Assessing oral Interaction. Language Testing, 10(3): 305-335.
Wigglesworth, G. (1994). Patterns of Rater Behaviour in The Assessment of An Oral Interaction Test. Australian Review of Applied Linguistics, 17(2): 77-103.
Winke, P. et al. (2013). Raters’ L2 Background As A Potential Source of Bias in Rating Oral Performance. Language Testing, 30(2): 231-252.
Wright, B. D. & Stone, M. H. (1979). Best TestDesign. Chicago: MESA Press.