Quality profile of Arabic final semester assessment items: A psychometric analysis

Authors

  • Zuliyah Safitri UIN Sunan Ampel Surabaya
  • M. Baihaqi

DOI:

https://doi.org/10.30603/al.v11i1.7322

Keywords:

Arabic language assessment;, content and construct validity;, psychometric analysis;, final semester test

Abstract

Background: The quality of assessment instruments is essential to ensure that students’ learning outcomes are measured accurately. In Arabic language learning, Final Semester Assessments (PAS) must be supported by sound psychometric qualities to function as valid and reliable evaluation tools.

Aims: This study aims to examine the quality profile of Arabic PAS items at MAN 1 Gresik by analysing their psychometric characteristics and identifying items that are feasible, need revision, or are not feasible for use.

Methods: This research employed a quantitative descriptive design using psychometric item analysis. The data consisted of 40 multiple-choice PAS items and students’ response sheets. The analysis integrated content and construct validity with empirical indicators, including point-biserial validity, KR-20 reliability, item difficulty, and item discrimination, using Microsoft Excel and ANATES V4.

Results: The results show that content validity reached 92.5%, construct validity reached 82.85%, and empirical validity was moderate (r = 0.60). The overall test reliability was high (r₁₁ = 0.75). Item difficulty is dominated by medium-level items, while item discrimination is the weakest aspect. Based on integrated psychometric criteria, 40% of items are feasible, 57,5% require revision, and 2,5% are non feasible. The causes of the failure of the test items, content validity (7.5%), construct validity (42.5%), empirical validity (2.5%), level of difficulty (12.5%), and discrimination index (22.5%).

Implications: These findings highlight the importance of systematic psychometric evaluation in Arabic language assessment. Improvements are needed in construct validity, especially Arabic language accuracy, distractor effectiveness, and item discrimination. Such an approach supports the improvement of school-based Arabic assessments to ensure more valid and reliable measurement of students’ learning outcomes.

Downloads

Download data is not yet available.

References

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Cole Publishing. https://books.google.co.id/books?id=cgElAQAAIAAJ

Aprilia, P. (2024). Cara penanganan siswa berkemampuan di atas rata-rata sedang dan rendah. Journal of Knowledge and Collaboration, 1(7), 311–323. https://doi.org/10.59613/6q3akf79

Arbeni, W., Windiani, A., Sihotang, D. S. B., Anggraini, N., Wulandari, S., & Nugroho, A. (2025). Test reliability analysis in educational evaluation: a quantitative approach to consistency and validity. Holistic Science, 5(1), 59–64. https://doi.org/10.56495/hs.v5i1.838

Choirudin, Sugianto, R., Darmayanti, R., & Muhammad, I. (2023). Teacher competence in the preparation of test and non-test instruments. Journal of Teaching and Learning Mathematics, 1(1), 25–32. https://doi.org/10.22219/jtlm.v1i1.27695

Damayanti, A. M., Daryono, & Rayanto, Y. H. (2022). Evaluasi pembelajaran. CV Basya Media Utama.

https://books.google.co.id/books?id=cM7cEAAAQBAJ&dq

Downing, S. M., & Haladyna, T. M. (2006). Handbook of test development. Lawrence Erlbaum Associates Publishers. https://psycnet.apa.org/record/2006-01815-000

Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement. 5th Edition, Prentice-Hall, Englewood Cliffs. https://psycnet.apa.org/record/1973-22100-000

Fikriyah, N. (2021). Analisis butir soal ulangan tengah semester mata pelajaran Bahasa Arab kelas VII semester genap SMP Muhammadiyah 1 Yogyakarta tahun ajaran 2019/2020* [Undergraduate thesis, Universitas Muhammadiyah Yogyakarta]. UMY ETD. https://etd.umy.ac.id/id/eprint/3069/

Hamid, M. A., Sutaman, S., Natsir, M., & Salih, I. O. M. (2022). The development of an evaluation instrument for the implementation of the Arabic language curriculum in Islamic high school. Jurnal Al Bayan: Jurnal Jurusan Pendidikan Bahasa Arab, 14(1), 242–257. https://doi.org/10.24042/albayan.v14i1.10303

Harfiani, M. (2022). Analisis butir soal bahasa Arab kelas XII pada Penilaian Akhir Semester (PAS) semester ganjil tahun ajaran 2021/2022 di MAN 2 Kota Bandung berdasarkan Taksonomi Bloom revisi* [Undergraduate thesis, UIN Sunan Gunung Djati Bandung]. https://digilib.uinsgd.ac.id/55693/

Hayati, R., Wijayati, I. W., Nugroho, F. A., Fazriansyah, M. F., Nurdini, Wardoyo, T. H., Evenddy, S. S., Fratiwi, N. J., Edi, S., Hadikusumo, R. A., Nurlely, L., Mahardiyanti, T., Ariantara, R. G., Tandirerung, V. A., Darmo, S. Y., Suminar, I., Pitrianti, S., Lisnasari, S. F., & Talindong, A. (2023). Asesmen pembelajaran: teori dan praktik. PT. Sada Kurnia Pustaka. https://books.google.co.id/books?id=XABbEQAAQBAJ

Hidayah, A. (2022). Internal quality assurance system of education in financing standards and assessment standards. Indonesian Journal of Education (INJOE), 1(3), 291–300. https://felifa.net/index.php/INJOE/article/view/129

İlhan, M., Güler, N., Teker, G. T., & Ergenekon, Ö. (2024). The effects of reverse items on psychometric properties and respondents’ scale scores according to different item reversal strategies. International Journal of Assessment Tools in Education, 11(1), 20–38. https://doi.org/10.21449/ijate.1345549

Imran, I., Bismark, B., Adiansyah, A., Munir, A., & Luthfiyah, L. (2025). Tindak lanjut asesmen pada PAI menjadi program remedial dan pengayaan (teknik memberikan umpan balik dan tindak lanjut hasil asesmen). Pedagogos: Jurnal Pendidikan, 7(1), 49–62. https://doi.org/10.33627/https://doi.org/10.33627/gg.v6i2

Lam, T. N. (2024). Enhancing the quality of competency assessment for elementary school students in modern education. International Research Journal of Management, IT and Social Sciences, 11(3), 93–101. https://doi.org/10.21744/irjmis.v10n3.2429

Liani, A. M., Asmaun, & Nasrullah, A. H. (2025). Peran penilaian yang efektif dalam pengambilan keputusan guru di kelas. Pedagogy: Jurnal Pendidikan Matematika, 10(2), 393–409. https://doi.org/10.30605/pedagogy.v10i2.5904

Meyliasari, A. R., Al-Ibrahimy, A. M., Rohmawati, B., Ariyana, D., Erlindasari, D. P., Nurzaliha, D. P., & Malikah, N. (2024). Penyusunan instrumen penilaian afektif di sekolah. Muaddib: Jurnal Pendidikan Agama Islam, 2(2), 430–441.

Millman, J., & Greene, J. (1989). The specification and development of tests of achievement and ability. In Educational measurement. American Council on Education. https://psycnet.apa.org/record/1989-97348-008

Nizary, M. A., & Kholik, A. N. (2021). Validitas instrumen assesmen (Analisis validitas isi dan konstruk instrumen asesmen buku pelajaran Al Quran Hadis kelas 6 Madrasah Ibtidaiyah materi Surat Ad Dhuha bab VI). CONTEMPLATE: Jurnal Pendidikan Bahasa Arab, 2(01), 20-35. https://ejournal.iaiqi.ac.id/index.php/contemplate/article/view/49

Nurhasanah, Hidayatullah, Z., & Arif, M. B. S. (2024). Karakteristik instrumen tes literasi digital ditinjau dari validitas isi dan validitas empiris (kecocokan butir dengan model, reliabilitas, serta tingkat kesukaran butir). Journal of Classroom Action Research, 6(4), 916–923. https://doi.org/10.29303/jcar.v6i4.9650

Nurzahira, F., Jayadi, M. I., & Ridlo, U. (2025). Konsep evaluasi pembelajaran bahasa Arab. Ihya Al-Arabiyah: Jurnal Pendidikan Bahasa Dan Sastra Arab, 11(3), 467–484. http://dx.doi.org/10.30821/ihya.v11i3.26379

Permendikbud. (2013). Peraturan pemerintah republik Indonesia no. 32 tahun 2013 tentang perubahan atas peraturan pemerintah no. 19 tahun 2005 tentang standar nasional pendidikan. Menteri Pendidikan dan Kebudayaan Republik Indonesia. https://peraturan.bpk.go.id/Home/Details/5364/pp-no-32-tahun-2013

Qorib, M. (2024). Analysis of differentiated instruction as a learning solution in student diversity in inclusive and moderate education. International Journal Reglement & Society (IJRS), 5(1), 43–55. https://doi.org/10.55357/ijrs.v5i1.452

Saputra, H. D., Purwanto, W., Setiawan, D., Fernandez, D., & Putra, R. (2022). Hasil belajar mahasiswa: analisis butir soal tes. Edukasi: Jurnal Pendidikan, 20(1), 15–27. https://doi.org/10.31571/edukasi.v20i1.3432

Saputri, H. A. S., Zulhijrah, Larasati, N. J., & Shaleh. (2023). Analisis instrumen asesmen: validitas, reliabilitas, tingkat kesukaran dan daya beda butir soal. Didaktik: Jurnal Ilmiah PGSD STKIP Subang, 9(5), 2986–2995. https://doi.org/10.36989/didaktik.v9i5.2268

Sari, N., Ahmad, Manggaberani, A. A., Jusmiana, A., Metianing, D., Solikhin, F., Negara, H. R. P., Silubun, H. C. A., Disnawati, H., Afri, L. E., Santos, M. Dos, Bahriani, M., & Ningsih, T. Z. (2025). Konstruksi instrumen pendidikan. CV Ruang Tentor. https://books.google.co.id/books?id=Neg9EQAAQBAJ&redir

Savika, H. I., & Zuhriyah, I. A. (2024). Peran analisis butir soal terhadap kualitas soal, kompetensi guru, dan prestasi belajar peserta didik di sekolah dasar. Pandu: Jurnal Pendidikan Anak Dan Pendidikan Umum, 2(2), 43–51. https://doi.org/10.59966/pandu.v2i2.856

Sekaran, U., & Bougie, R. (2016). Research methods for business: a skill building approach. 7th Edition. John Wiley & Sons, Haddington. https://books.google.co.id/books?id=Ko6bCgAAQBAJ

Sibarani, C. G. G. T., Ahsan, J., & Umar, A. T. (2025). Buku monograf: evaluasi teori dan model. CV. Merdeka Kreasi Group. https://books.google.co.id/books?id=NGtxEQAAQBAJ

Sutomo, F. G., & Aini, M. R. Q. (2024). Pemahaman karakteristik peserta didik dalam mengoptimalkan pembelajaran. Jurnal Kajian Penelitian Pendidikan Dan Kebudayaan, 2(4), 60–72. https://doi.org/10.59031/jkppk.v2i4.499

Syafi’i, M., Samsudin, M., Abidin, Z., & Basarrudin, M. (2025). Evaluasi pendidikan sebagai dasar pengembangan instrumen penilaian berbasis kompetensi. Jurnal Akuntansi, Manajemen Dan Ilmu Pendidikan, 1(4), 1–12. https://journal.yapakama.com/index.php/JAMED/article/view/299

Tanjung, M. A. H. R., Fahmi, A. A., Rahmanita, F., Habibah, I. F., & Qomari, N. (2024). Analisis butir soal penilaian akhir tahun pelajaran Bahasa Arab kelas VII MTs Al-Ma'arif Rakit Banjarnegara Jawa Tengah. Mantiqu Tayr: Journal of Arabic Language, 4(1), 347–367. https://doi.org/10.25217/mantiqutayr.v4i1.4038

Thahir, M. (2023). Manajemen mutu sekolah. Indonesia Emas Group. https://books.google.co.id/books?id=wzraEAAAQBAJ

Umareni, Soehardin, U., & Shodikin, E. N. (2024). Evaluasi pembelajaran bahasa Arab kelas 7 di marhalah salafiyah wustho pondok pesantren Islamic centre bin baz putri Yogyakarta. Ascent: Al-Bahjah Journal of Islamic Education Management, 2(1), 27–35. https://doi.org/10.61553/ascent.v2i1.157

Yu, J., Kreijkes, P., & Salmela-Aro, K. (2022). Students’ growth mindset: relation to teacher beliefs, teaching practices, and school climate. Learning and Instruction, 80, 101616. https://doi.org/10.1016/j.learninstruc.2022.101616

Zahroh, F. L., & Hilmiyati, F. (2024). Indikator keberhasilan dalam evaluasi program pendidikan. Edu Cendikia: Jurnal Ilmiah Kependidikan, 4(3), 1052–1062. https://doi.org/10.47709/educendikia.v4i03.5049

Zayrin, A. A., Nupus, H., Maizia, K. K., Marsela, S., Hidayatullah, R., & Harmonedi, H. (2025). Analisis instrumen penelitian pendidikan (uji validitas dan reliabilitas instrumen penelitian). Qosim: Jurnal Pendidikan Sosial & Humaniora 3.2, 3(2), 780–789. https://doi.org/10.61104/jq.v3i2.1070

Downloads

Published

2026-02-28

How to Cite

Zuliyah Safitri, & M. Baihaqi. (2026). Quality profile of Arabic final semester assessment items: A psychometric analysis. Al-Lisan: Jurnal Bahasa (e-Journal), 11(1), 87–102. https://doi.org/10.30603/al.v11i1.7322