NATURAL LANGUAGE PROCESSING (NLP) APPLICATIONS FOR ERROR ANALYSIS IN LEARNING INDONESIAN FOR FOREIGN SPEAKERS (BIPA)
Downloads
The increasing global demand for Indonesian language learning (BIPA) necessitates systematic, scalable error analysis to optimize pedagogical interventions, a task severely hindered by the limitations of manual correction. This study aimed to develop and validate a specialized Natural Language Processing (NLP) framework to automatically classify linguistic errors in BIPA written output and generate a statistically generalizable error map for curriculum reform. The research employed a corpus-based, developmental design, building a BIPA-Optimized NLP Error Classification Pipeline and validating it on a corpus of over 500,000 words. The model achieved a high F1-score of 0.89. Findings revealed a high error density (7.2 per 100 words), with Affix Misapplication constituting the most resistant obstacle (45% of all errors). Crucially, ANOVA confirmed a non-significant reduction rate of these errors across proficiency levels (p=0.316), indicating that simple exposure is insufficient. The study concludes that the NLP pipeline successfully provides the first objective diagnostic standard for BIPA pedagogy, proving that the difficulty is structural. This mandates an urgent shift toward systematic, targeted remediation strategies focused on the most persistent error sub-types, enabling evidence-based curriculum development.
Amalia, A., Hayati, I., Afandi, A., Lubis, A. S., & Marpaung, J. L. (2025). Comparative Forecasting of Indonesian Stock Prices Using ARIMA and Support Vector Regression: A Statistical Learning Approach. Mathematical Modelling of Engineering Problems, 12(7), 2307–2315. Scopus. https://doi.org/10.18280/mmep.120711
Anthonius, F., & Ari, A. (2024). The Implementation of Paperrater and Grammarly in English Teaching: The Implementation of Paperrater and Grammarly in English Teaching to Boost the Writing skills of Non-English Undergraduate Students. ACM Int. Conf. Proc. Ser., 142–146. Scopus. https://doi.org/10.1145/3678726.3678746
Deviantari, U. W., Aditya, T., & Djojomartono, P. N. (2025). The Application of Random Forest Prediction in Developing a Systematic Land Parcel Value in the Urban Area. International Journal of Geoinformatics, 21(7), 58–79. Scopus. https://doi.org/10.52939/ijg.v21i7.4319
Ekakristi, A. S., Wicaksono, A. F., & Mahendra, R. (2025). Intermediate-task transfer learning for Indonesian NLP tasks. Natural Language Processing Journal, 12. Scopus. https://doi.org/10.1016/j.nlp.2025.100161
Hafidz, I. H., Sulistya, A., & Lidiawaty, B. R. (2025). Sentiment Analysis of Public Complaints: A Machine Learning Comparison of SVM, Naive Bayes, Random Forest, and XGBoost. ICADEIS - Int. Conf. Adv. Data Sci., E-Learning Inf. Syst.: Integr. Data Sci. Inf. Syst., Proceeding. Scopus. https://doi.org/10.1109/ICADEIS65852.2025.10933382
Hidayatullah, A. F., Apong, R. A., Lai, D. T. C., & Qazi, A. (2025). Pre-trained language model for code-mixed text in Indonesian, Javanese, and English using transformer. Social Network Analysis and Mining, 15(1). Scopus. https://doi.org/10.1007/s13278-025-01444-9
Jazuli, A., & Kusumaningrum, R. (2025). Optimizing Aspect-Based Sentiment Analysis Using BERT for Comprehensive Analysis of Indonesian Student Feedback. Applied Sciences (Switzerland), 15(1). Scopus. https://doi.org/10.3390/app15010172
Jiang, S., Su, Y., Li, X., Lin, N., Xiao, L., & Wang, L. (2025). Unraveling the Efficacy of In-Context Learning in Indonesian Grammatical Error Correction. Dalam W. Shen, W. Shen, M.-H. Abel, N. Matta, J.-P. Barthes, J. Luo, J. Zhang, H. Zhu, & K. Peng (Ed.), Proc. Int. Conf. Comput. Support. Coop. Work Des., CSCWD (Nomor 2025, hlm. 1704–1709). Institute of Electrical and Electronics Engineers Inc.; Scopus. https://doi.org/10.1109/CSCWD64889.2025.11033420
Kiatphaisansophon, P., Wanvarie, D., & Cooharojananone, N. (2024). Efficient Text Bounding Box Identification Using Mask R-CNN: Case of Thai Documents. IEEE Access, 12, 49306–49328. Scopus. https://doi.org/10.1109/ACCESS.2024.3383911
Kusumoputro, B., Imantaka, S. R., Lina, L., & Kresnaraman, B. (2011). Face recognition system of infra-red images using ensemble back-propagation neural networks. International Journal of Artificial Intelligence, 7(11 A), 401–416. Scopus.
Lefrandt, M., Santoso, E. B., Santoso Gunawan, A. A. S., & Tedjasulaksana, J. J. (2025). Contextual Spelling Corrector for Indonesian Text Preprocessing: A Comparative Analysis of Large Language Models. Proc. IEEE Int. Conf. Ind. 4.0, Artif. Intell., Commun. Technol., IAICT, 290–296. Scopus. https://doi.org/10.1109/IAICT65714.2025.11100636
Mantasiah, R., Yusri, Y., & Anwar, M. (2021). Integrating linguistics theories in developing foreign language teaching material (German grammar textbook for indonesian learners). International Journal of Language Education, 5(3), 125–134. Scopus. https://doi.org/10.26858/ijole.v5i3.20239
Nasution, S., Ferdiana, R., & Hartanto, R. (2025). Towards Two-Step Fine-Tuned Abstractive Summarization for Low-Resource Language Using Transformer T5. International Journal of Advanced Computer Science and Applications, 16(2), 1220–1230. Scopus. https://doi.org/10.14569/IJACSA.2025.01602120
Naufal, T., Mahendra, R., & Wicaksono, A. F. (2025). Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning. Journal of Biomedical Semantics, 16(1). Scopus. https://doi.org/10.1186/s13326-025-00329-2
Navastara, D. A., Akbar Hidiya, F. R., & Wijaya, A. Y. (2023). Prediction of Indonesian Stock Price Using Combination of CNN and BiLSTM Model. Dalam H.-C. Chen, C. Damarjati, C. Blum, Y. Jusman, S. N. A. M. Kanafiah, & W. Ejaz (Ed.), Proceeding—Int. Conf. Inf. Technol. Comput., ICITCOM (hlm. 307–312). Institute of Electrical and Electronics Engineers Inc.; Scopus. https://doi.org/10.1109/ICITCOM60176.2023.10442941
Ningsih, R. Y., Oktriono, K., Wiharja, C. K., & Ernawati, E. (2018). Forms of language errors in speaking practices of foreign students through online UKBIPA application. ACM Int. Conf. Proc. Ser., 59–62. Scopus. https://doi.org/10.1145/3291078.3291092
Pardamean, B., Suparyanto, T., Cenggoro, T. W., Sudigyo, D., & Anugrahana, A. (2022). AI-Based Learning Style Prediction in Online Learning for Primary Education. IEEE Access, 10, 35725–35735. Scopus. https://doi.org/10.1109/ACCESS.2022.3160177
Pramuniati, I., & Sitinjak, D. R. (2024). The interlanguage of French learning Indonesian as a foreign language. Indonesian Journal of Applied Linguistics, 14(1), 206–219. Scopus. https://doi.org/10.17509/ijal.v14i1.70394
Priyanto, A., Hapidin, D. A., Edikresnha, D., Aji, M. P., & Khairurrijal, K. (2025). Predicting microplastic quantities in Indonesian provincial rivers using machine learning models. Science of the Total Environment, 961. Scopus. https://doi.org/10.1016/j.scitotenv.2025.178411
Qomariyah, N. N., Karen, A., Natalie, V., Kazakov, D., & Chaetajaka, P. A. (2025). Utilizing Bidirectional Long Short-Term Memory (BiLSTM) for Radiology Reports in Indonesian Language. Int. Conf. Knowl. Smart Technol., KST, 127–132. Scopus. https://doi.org/10.1109/KST65016.2025.11003318
Rabiha, S. G., Yossy, E. H., Indrianti, Y., & Sasmoko, n. (2019). A Neural network based approach for predicting Indonesian teacher engagement index (itei). Proceeding - Int. Conf. Artif. Intell. Inf. Technol., ICAIIT, 469–474. Scopus. https://doi.org/10.1109/ICAIIT.2019.8834546
Rahutomo, F., & Harjito, B. (2025). Machine Learning-Based Climate Prediction in Indonesia: A Baseline Experiment. International Journal of Advanced Computer Science and Applications, 16(8), 797–810. Scopus. https://doi.org/10.14569/IJACSA.2025.0160877
Ramdani, D., Susilo, H., Suhadi, S., & Sueb, S. (2023). The Effect of Problem Based Learning on Critical Thinking Skills of Biology Learning in Indonesia: A Meta-Analysis Study. Dalam H. Habiddin & N. Farida (Ed.), AIP Conf. Proc. (Vol. 2569). American Institute of Physics Inc.; Scopus. https://doi.org/10.1063/5.0112352
Salas-Pilco, S. Z., Xiao, K., & Hu, X. (2023). Correction to: Artificial Intelligence and Learning Analytics in Teacher Education: A Systematic Review (Education Sciences, (2022), 12, 8, (569), 10.3390/educsci12080569). Education Sciences, 13(9). Scopus. https://doi.org/10.3390/educsci13090897
Saputro, B. A., Suryadi, D., Rosjanuardi, R., & Kartasasmita, B. G. (2018). Analysis of students’ errors in responding to TIMSS domain algebra problem. J. Phys. Conf. Ser., 1088. Scopus. https://doi.org/10.1088/1742-6596/1088/1/012031
Sari, Y. A., Nakazawa, A., & Wani, Y. A. (2025). LeFood-set: Baseline performance of predicting level of leftovers food dataset in a hospital using MT learning. PLOS ONE, 20(5 May). Scopus. https://doi.org/10.1371/journal.pone.0320426
Simanungkalit, E., & Tuga, T. (2024). Data-Driven Insights for Mobile Banking App Improvement: A Sentiment Analysis and Topic Modelling Approach for SimobiPlus User Reviews. International Journal of Engineering Trends and Technology, 72(6), 347–360. Scopus. https://doi.org/10.14445/22315381/IJETT-V72I6P132
Soffan, S., Bramantoro, A., & Alzahrani, A. A. (2025). Combination of machine learning and data envelopment analysis to measure the efficiency of the Tax Service Office. PeerJ Computer Science, 11. Scopus. https://doi.org/10.7717/PEERJ-CS.2672
Utami, E., Oyong, I., Raharjo, S., Hartanto, A., & Adi, S. (2025). Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia. Applied Computing and Informatics, 21(1–2), 141–151. Scopus. https://doi.org/10.1108/ACI-03-2021-0054
Utami, V. M., Maylawati, D. S., Syaripudin, U., Zulfikar, W. B., Wahana, A., & Slamet, C. (2025). Text Generation for Low-Calorie Food Recipes Using IndoBART. Proc. Int. Conf. Wirel. Telemat., ICWT. Scopus. https://doi.org/10.1109/ICWT66752.2025.11181956
Utomo, B., Soedarto, T., Winarno, S. T., Hendrarini, H., & Farid, I. W. (2024). Monthly Forecasting Indonesian Coffee Production Using Extreme Learning Machine. Dalam F. W. Wibowo (Ed.), Beyond Technol. Summit Informatics Int. Conf., BTS-I2C (hlm. 682–686). Institute of Electrical and Electronics Engineers Inc.; Scopus. https://doi.org/10.1109/BTS-I2C63534.2024.10941753
Wijaya, B. C., & Sugiarto, H. S. (2025). Transformer+transformer architecture for image captioning in Indonesian language. IAES International Journal of Artificial Intelligence, 14(3), 2338–2346. Scopus. https://doi.org/10.11591/ijai.v14.i3.pp2338-2346
Winata, R., Willson, A., Tjen, W., & Madyatmadja, E. D. (2025). Sentiment Analysis on Indonesian Military Law Debate Using Machine Learning and IndoBERT. Proceeding - Int. Conf. Creat. Commun. Innov. Technol.: Empower. Transform. MATURE LEADERSH.: Harnessing Technol. Adv. Glob. Sustain., ICCIT. Scopus. https://doi.org/10.1109/ICCIT65724.2025.11167858
Yotenka, R., Muhajir, M., Hermansah, n., & Rodrigues, P. C. (2025). Comparative Analysis of Activation Functions in Recurrent Neural Network: An Application to Indonesian Inflation Forecasting. Mathematical Modelling of Engineering Problems, 12(3), 754–762. Scopus. https://doi.org/10.18280/MMEP.120302
Copyright (c) 2025 Inriati Lewa, Zain Nizam, Andres Villanueva, Le Hoang Nam

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


















