A Comprehensive Review of Machine Learning Algorithms for Predicting Student Dropouts in Educational Data Mining

Authors

  • Rabia Bhutto Department of Software Engineering, Mehran University of Engineering and Technology Jamshoro, Pakistan
  • Kanwal Batool Department of Computer Systems Engineering Mehran University of Engineering and Technology Jamshoro, Pakistan
  • Pireh Soomro Department of Software Engineering, Mehran University of Engineering and Technology Jamshoro, Pakistan
  • Muhammad Hasnain Ali Dad Artificial Intelligence, Mehran University of Engineering and Technology Jamshoro, Pakistan
  • Muhammad Noor Murtaza Qureshi Artificial Intelligence MUET, Jamshoro

Abstract

Student dropout is one of the major challenges faced by educational institutions, as it negatively affects aca-demic performance, institutional reputation, and overall stu-dent success. This study presents a machine learning-based framework for predicting student dropout using educational data mining techniques. A publicly available dataset obtained from the UCI Machine Learning Repository was utilized for experimental analysis. The dataset contains students’ demo-graphic information, academic performance, attendance records, assignment results, socio-economic background, and engagement-related attributes. Several machine learning algorithms including Logistic Regression, Decision Tree, Random Forest, Multilayer Perceptron (MLP), k-Nearest Neighbors (KNN), and Support Vector Machine (SVM) were implemented and evaluated for student dropout prediction. Data preprocessing techniques such as cleaning, normalization, and handling of imbalanced class distributions were applied to improve model reliability and prediction performance. The dataset was divided into training and testing subsets, while 10-Fold Cross-Validation was used to ensure robustness and generalization capability of the models. The performance of the classifiers was evaluated using Accuracy, Precision, Recall, and F1-Score metrics. Experimental results demonstrated that the Support Vector Machine (SVM) achieved the highest classification performance with an accuracy of 93.80%, precision of 0.939, recall of 0.938, and F1-score of 0.938, significantly outperforming the remaining classifiers. Random Forest, Multilayer Perceptron, KNN, Decision Tree, and Logistic Regression achieved comparatively lower prediction accuracies. The findings further revealed that academic performance, atten-dance percentage, assignment scores, engagement level, and socio-economic background are among the most influential factors affecting student dropout prediction. The proposed machine learning framework provides an efficient and practical solution for identifying at-risk students at an early stage, enabling educational institutions to implement timely intervention strategies to improve student retention and academic success.

Keywords: Educational data mining, Machine Learning, Student dropout, Prediction, Classification

Downloads

Published

2026-05-30

How to Cite

Rabia Bhutto, Kanwal Batool, Pireh Soomro, Muhammad Hasnain Ali, & Muhammad Noor Murtaza Qureshi. (2026). A Comprehensive Review of Machine Learning Algorithms for Predicting Student Dropouts in Educational Data Mining. `, 5(2), 1361–1370. Retrieved from https://www.assajournal.com/index.php/36/article/view/1799