اعتبارِ سازه‌ی معیارنمره‌دهیِ مهارت نوشتن در آزمون جامع فارسی دانشگاه فردوسی برای غیرفارسی‌زبانان (مقاله علمی پژوهشی)

نوع مقاله : مقاله پژوهشی

نویسندگان

1 نویسنده‏ ی مسئول، دکتری زبان و ادبیات فارسی، دانشگاه فردوسی مشهد

2 استادیار گروه زبان وادبیات فارسی، دانشگاه فردوسی مشهد

3 دانشگاه فردوسی مشهد : گروه زبان و ادبیات انگلیسی

چکیده

سنجش زبان یکی از ارکان اساسی­ هر نظام­ آموزش زبان به شمار می­آید. بخش عمده­ای از کارآمدی مراکز آموزشی در گرو بهره­گیری از شیوه­های معتبر سنجش است. در پژوهش حاضر تلاش شده است تا به بررسی اعتبارسازه­ایِ معیار نمره­دهیِ مهارت نوشتن در آزمون رسمی پایان دوره­ی مرکز زبان فارسی دانشگاه فردوسی پرداخته شود. به این منظور نتایج به­دست آمده از یکی از آزمون­های برگزار شده در این دانشگاه توسط مدل­های  آماری راش و تحلیل عاملی مورد بررسی قرار گرفت. نتایج تحلیل عاملی نشان داد که سه سازه­ی کیفیت زبان، انسجام و ارتباط با موضوع برای سنجش مهارت نوشتن از میزان اعتبار بالایی برخوردار هستند. در این میان سازه‌ی انسجام با 0.98 بیشترین میزان و دو سازه­ی دیگر هر کدام با 0.97 دومین میزان بار عاملی را داشتند. همچنین در معیار نمره­دهی این آزمون بسندگی از یک مقیاس شش درجه­ای برای نمره­دهی هر یک از سازه­ها استفاده شده است. مدل آماری راش نشان داد که هر یک از ارزیاب­ها توانسته­اند به شکل نسبتاً صحیحی از این مقیاس برای نمره­دهی استفاده کنند، زیرا ترتیب آستانه­ها مطابق ترتیب نمرات است و بهم­ریختگی ندارد. از سویی دیگر نقشه­ی آزمون­دهنده پرسش گویای این امر بود که این مقیاس نمره­دهی توانایی تمیز آزمون­دهندگان ضعیف، متوسط و قوی را از یکدیگر دارد و مؤلفه­ها و درجه­های نمره­گذاری (0 تا 5) همه­ی گستره‌ی توانایی آزمون‌دهندگان را دربرمی­گیرند و با این مقیاس می­توان تمام داوطلبان با هر میزان توانایی نوشتن را اندازه گرفت. همچنین میزان پایایی نمره­دهنده در این آزمون 0.96 برآورد شد که رقم بسیار مناسبی محسوب می‌شود.

کلیدواژه‌ها


عنوان مقاله [English]

The Construct Validity of Writing Skill’s Scoring Rubric in the Persian Proficiency Test of Ferdowsi University of Mashhad

نویسندگان [English]

  • Mohsen Roudmajani 1
  • Ehsan Ghabool 2
  • Behzad ghonsooly 3
1 Corresponding author, Phd graduate in Persian Language and Literature, Ferdowsi University, Mashhad, Iran
2 Assisstent professor, Departement of Persian language and literature, Ferdowsi University of Mashhad, Iran
3 Full Professor, Derpartement of Teaching English language, Ferdowsi University of Mashhad, Iran
چکیده [English]

Language assessment is an essential part of each Language teaching syllabus.The effectiveness of language educational systems highly depends on the validity of language assessment methods. This research tries to study the construct validity of the writing scoring rubric in the Ferdowsi University’s Persian Proficiency test. For this purpose the results from one of the official tests performed in the Ferdowsi international center for teaching Persian to non-Persian speakers was analyzed with Rasch and factor analysis models. The results showed that the three constructs which are topic development, quality of Language and organization, all have high factor loads. While organization has the highest loads with 0.98 this amount for the two other constructs are 0.97. In order to score each construct, the writing rubric in this test has a six grades’ scale. The Rasch models showed that all raters used this scale in an appropriate way because the order of the thresholds are well organized and match with order of the scores. Also the Write map indicates that this scale can differentiate well elementary, intermediate and advanced examinees and the grades cover all students with different levels of abilities. The rater reliability for this test is 0.98 which is highly acceptable.

Extended Abstract:
Language assessment is one of the most important part of any language educational system.Much of the effectiveness of Persian language centers depends on the use of precise assessment and evaluation techniques.In fact, Persian language institutions face the fundamental question of how to convert abstract concepts of linguistic knowledge and communication ability into numbers.Understanding learners' progress, identifying their weaknesses, and making accurate decisions about them requires accurate and scientific methods of assessment and evaluation. The present study attempts to evaluate the construct validity of the Writing Skill’s Assessment rubric in the Ferdowsi Persian Language Examination.The test, which is approved by the Ministry of Science, is held twice a year at Ferdowsi University of Mashhad, Iraq, and in some other countries.This research attempts to answer the following three questions:
1. To what extent do the constructs defined in the scoring rubric measure distinct components of the writing skill?
    2. To what extent can the six-point scale distinguish intermediate, weak, and strong test takers?
    3. To what extent do the scorers agree on the use of the scoring criteria?
So far, several scoring rubrics have been presented in Persian language in order to assess non-native Persian language speaker’s writing ability but none of them have been validated by quantitative research methods. The aim of current study was to investigate the construct validity of writing section of Ferdowsi University’s Persian language proficiency test. This test is designed based on the TOEFL theoretical underpinnings. The writing scoring rubric in Ferdowsi University’s Persian language proficiency test consists of three components namely language quality, cohesion, and topic development. Obviously, this scoring rubric is derived from a communicative view of the concept of language, which not only measures test taker’s linguistic knowledge at the level of words, sentence and discourse, but also assesses the ability to perform language tasks.Based on this rubric, in addition to the quality of the language and the cohesion of the text, the evaluators examine to what extent the written text is consistent with the purpose of the language task.
In order to evaluate the construct validity of the scoring rubric of the test, the results of one of the tests held in Ferdowsi International Persian Language Center were analyzed by Rash statistical model and factor analysis.The test was held on July 8, 1397 at the International Center for Persian Language at Ferdowsi University of Mashhad and Strasbourg University in France.The writing section of the test includes two tasks.In the first task, an audio file was first played for the test takers, and then they were asked to write a summary of it.In the second task, the test takers were given a topic to write about it in 200 words.
The participants in this study were 106 students consisting of 30 women and 76 men.Iraq, with 50 participants, and Pakistan with 30, respectively, had the first and second highest participants.The other participants were from India, Indonesia, Lebanon, Syria, and Italy, each with 13, 2, 2, 4 and 5 participants, respectively.In terms of educational background, the Humanities, with 68 members, constitutes the most participants.Engineering with 23 and medicine with 11 were in the next ranks in terms of number of participants.
Factor analysis results showed that the three identified constructs all have high validity.That is, writing skill can be divided into language quality, cohesion, and topic development and can be scored separately.Cohesion with 0.98 had the highest factor load and the other two constructs each with 0.97 had the second highest factor loadings.Based on these statistics, it can be said that writing proficiency can be divided into language quality, cohesion and topic development constructs and each of these components measures a separate construct.
The Ferdowsi Persian Language Proficiency test use a six-point scoring scale. Rash's statistical model showed that each of the scorers was able to use this criterion fairly correctly because the order of thresholds were in accordance with the order of the scores and did not change. On the other hand, the write map indicated that this scoring scale has the ability to distinguish between weak, intermediate, and strong test takers, and the 0 to 5 criteria cover all range of test takers so this scale can measure all candidates with any level of writing ability.On the other hand, the reliability of the scoring was 0.96, which is very well.This result indicates that test evaluators have used scoring criteria in the same way so the test has reasonable scorer reliability.

کلیدواژه‌ها [English]

  • Language assessment
  • Constrcut validity
  • Writing skill
  • AZFA
جلیلی، سیّد اکبر. (1390). آزمون مهارتی فارسی (آمفا) بر پایه‌ی چهار مهارت اصلی زبانی) پایان‌نامه‌ی کارشناسی‌ارشد(.دانشگاه علاّمه طباطبائی، تهران، ایران.
جلیلی، سیّد اکبر. (1396). ارزیابی تولیدات نوشتاری فارسی­آموزان سطح پیشرفته: طراحی یک چارچوب جزئی­نگر. پژوهش­نامه‌ی آموزش زبان فارسی به غیرفارسی‌زبانان. 6 (1) (پیاپی 13)، 64-31.
گل‌پور، لیلا. (1394). طرّاحی و اعتباربخشی آزمون بسندگی زبان فارسی بر پایه‌ی چهار مهارت زبانی (پایان‌نامه‌ی دکتری). دانشگاه پیام‌نور مرکز، تهران، ایران.
گل‌پور، لیلا. (1397). طراحی آزمون مهارت نوشتاری ویژه‌ی غیرفارسی‌زبانان: راهکارها و تحلیل خطاها. پژوهش‌نامه‌ی آموزش زبان فارسی به غیرفارسی‌زبانان. 7 (2) (پیاپی 16)، 68-45.
متولیان نائینی، رضوان و استوار ابرقویی، عباس. (1393). نقش تداخل در پیدایش خطاهای نحوی در نگارش فارسی‏آموزان عرب­زبان. پژوهش­های زبانشناختی در زبان­های خارجی. 3 (2). 38-351.
متولیان نائینی، رضوان و ملکیان، رسول.(1393). تحلیل خطاهای نحوی فارسی‌آموزان اردو‌زبان. در پژوهش­نامه‌ی آموزش زبان فارسی به غیرفارسی‌زبانان. 3 (1) (پیاپی 6)، 56-29.
 
References:
Attali, Y. (2007).Construct validity of e-rater in scoring TOEFL essays. (TOEFL Research Rep. 07-21).  Princeton, NJ: Educational Testing Service.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Bachman, L. (2004). Statistical Analysis for Language Assessment. New York: Cambridge University Press.
Bachman L. & Palmer A. (1996). Language testing in practice. New York: Oxford University Press.
Bachman L. & Palmer A. (2010). Language assessment in practice. New York: Oxford University Press.
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah,NJ: Lawrence Erlbaum.
Buck, G. (2001). Assessing listening. Cambridge University Press.
Canale, M. (1983). From Communicative Competence to Communicative Language Pedagogy. In Richards, C. and Schmidt, R. W. (eds) Language and Communication. London: Longman. 2-27
Canale, M. & Swain, M. (1980). Theoretical basis of Communicative Approaches to second language teaching and testing.Applied Linguistics 1, 1, 1-47
Celce-Murcia, M., Dornyei, Z., & Thurrell, S. (1995). Communicative Competence: a pedagogical motivated model with content specifications. Issues in Applied Linguistics 2, 5-35.
Chapelle, C., Grabe, W., & Berns, M. (1997). Communicative language proficiency: definition and implications for TOEFL 2000. (TOEFL Monograph No. 10) Princeton, NJ: Educational Testing
Cumming, A., Kantor, R., Powers, D., Santos, T., &Taylor, C. (2000). TOEFL 2000 writing framework: a working paper (TOEFL Monograph No. 18). Princeton, NJ: Educational Testing Service.
Cumming, A., Kantor, R., & Powers, D. E. (2001). Scoring TOEFL essays and TOEFL 2000 prototype writing tasks: An investigation into raters’ decision making and development of a preliminary analytic framework. (TOEFL Monograph No. 22). Princeton, NJ: Educational Testing Service.
Cumming, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. Modern Language Journal, 86, 67-96.
ETS: Educational Testing System. (2012). The official Guide to the TOEFL test. (4th ed). New York: Mc Graw Hill.
Erdosy, M. U. (2004). exploring variability in judging writing ability in a second language: a study of four experienced raters of ESL compositions. (TOEFL Research Rep. No.70). Princeton, NJ: Educational Testing Service.
Fulcher, G. & Davidson, F. (2007). Language testing and assessment and advanced resource Book. New York: Routledge.
Golpour, L. (2018). Developing of a writing skill test for non-Persian learners: Approaches and analysis of errors. Journal of teaching Persian to Speaker of Other Languages.(7) 2, 45-68. [in Persian ].
Golpour, L. (2014). Designing and validating Persian proficiency test based on four language skills. (PhD. Dissertation). Pyamnour markaz University, Iran. [ in Persian ].
Harris, D. P. (1969). Testing English as a Second Language. New York: McGraw-Hill Book Company.
Hambleton, R. K., Swaminathan, H. & Rogers, J.(1991).Fundamentals of Item Response Theory. Newbury Park: Sage Publication.
Jalili, A. (2017). Assessing advanced Persian language learners written production: developing a detailed rubric. Journal of teaching Persian to Speaker of Other Languages.(6) 1, 158. [in Persian].
Jalili, A. (2011). Persian Language proficiency test based on four main langage skills. (MA. Dissertation). Allameh Tabataba’I University, Iran. [in Persian].
Kelly, T. L. (1927). Interpretation of educational Measurements. New York: World Book Company.
Lado, R. (1961). Language Testing. London: Longman.
Linacre, J. M. (2009). A user’s guide to WINSTEPS. Chicago, IL: Winsteps.
Lissitz, R. W. (ed.), (2009). The concept of validity: revisions new directions and applications. Charlotte, NC: Information Age Publishing, INC.
Messick, S. (1987). Validity (Report no. RR-87-40). Princeton: ETS.
Motavallian Nayini, R., & Abarghouyi, A. (2013). The study of Persian syntactic errors by Arabic –  speaking learners, Journal of teaching Persian to Speaker of Other Languages.(2) 2.[in Persian].
Motavallian Nayini, R., & Malekian R. (2013). Syntactic error analysis of urdu-speaking learners of Persian. Journal of teaching Persian to Speaker of Other Languages.(3) 1, 31-64. [in Persian].
Mousavi, S. A. (2012). Item Response Theory. In An Encyclopedic Dictionary of Language Testing. 5th ed. Tehran. Rahnama.
Powers, D. E., Burstein, J. C., Chodorow, M., Fowles, M. E., & Kukish, k. (2000). Comparing the validity of automated and human essay scoring. (GRE No.98-08 aR). Princeton, NJ: Educational Testing Service.
Richards J. C. & Rodgers, T., S. (2014). Approaches and methods in language teaching. Third ed. Cambridge: Cambridge University press.
Weigle, S, C. (1999). Investigating Rater prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing. 6(2).145-178.
Zhang, M., Breyer, F. J.,& Lorenz, F. (2013). Investigating the suitability of implementing the e-rater scoring engine in a large scale English language testing program. (TOEFL Research Rep. 13-36). Princeton, NJ: Educational Testing Service.