TY - CHAP
T1 - Assessing oral presentations and interactions
T2 - From a systematic to a salient-feature approach
AU - Berger, Armin
PY - 2022
Y1 - 2022
N2 - Most rating scales for performance assessment distinguish between different levels by systematically replacing abstract qualifiers such as some, many, or most at each band (the systematic approach). Less frequently, distinctions are based on concrete aspects of performance characteristic of the band concerned (the salient-feature approach). This chapter presents a study which compares and contrasts the two approaches. The main aim was to evaluate whether rating scales featuring salient aspects of performance are more reliable for the purpose of assessing academic presentation and interaction skills in the context of an undergraduate speaking course than rating scales which distinguish between the levels systematically. Both qualitative and quantitative methods were employed to evaluate the effectiveness of the scales. In phase one, the scores of 60 live-exam performances rated on the basis of systematic scales were compared to the scores of 84 mock-exam performances based on salient-feature scales. The latter had two formats, first as six-point scales with every band (except for the lowest) being defined by descriptors and then as ten-point scales with unworded bands in between. Many-facet Rasch analysis showed that the salient-feature scales are generally superior in terms of rater reliability and criteria separation. However, raters were unable to distinguish as many as ten bands reliably, although, according to interview data, raters find undefined intermediate levels very useful. The results have implications for scale revision, rater training, and future scale development.
AB - Most rating scales for performance assessment distinguish between different levels by systematically replacing abstract qualifiers such as some, many, or most at each band (the systematic approach). Less frequently, distinctions are based on concrete aspects of performance characteristic of the band concerned (the salient-feature approach). This chapter presents a study which compares and contrasts the two approaches. The main aim was to evaluate whether rating scales featuring salient aspects of performance are more reliable for the purpose of assessing academic presentation and interaction skills in the context of an undergraduate speaking course than rating scales which distinguish between the levels systematically. Both qualitative and quantitative methods were employed to evaluate the effectiveness of the scales. In phase one, the scores of 60 live-exam performances rated on the basis of systematic scales were compared to the scores of 84 mock-exam performances based on salient-feature scales. The latter had two formats, first as six-point scales with every band (except for the lowest) being defined by descriptors and then as ten-point scales with unworded bands in between. Many-facet Rasch analysis showed that the salient-feature scales are generally superior in terms of rater reliability and criteria separation. However, raters were unable to distinguish as many as ten bands reliably, although, according to interview data, raters find undefined intermediate levels very useful. The results have implications for scale revision, rater training, and future scale development.
KW - Descriptor formulation
KW - Group interviews
KW - Many-facet Rasch analysis
KW - Performance assessment
KW - Rating scale development and validation
UR - https://www.scopus.com/pages/publications/85146624980
U2 - 10.1007/978-3-030-79241-1_26
DO - 10.1007/978-3-030-79241-1_26
M3 - Chapter
SN - 978-3-030-79240-4
T3 - English Language Education
SP - 297
EP - 321
BT - Developing advanced English language competence
A2 - Berger, Armin
A2 - Heaney, Helen
A2 - Resnik, Pia
A2 - Rieder-Bünemann, Angelika
A2 - Savukova, Galina
PB - Springer
CY - Cham
ER -