Impact of SMOTE Oversampling on Classifying Band Gap Types in Imbalanced ABO₃ Perovskite Oxides

Authors

Desvita Maharani , Johana Oktavia Ramadhani , Aliyah Zahratu Rizqi , Muhamad Akrom

DOI:

10.29303/jpft.v12i1.11479

Published:

2026-04-30

Downloads

Abstract

This study investigates the impact of the Synthetic Minority Over-sampling Technique (SMOTE) on the classification of direct and indirect band gap types in imbalanced ABO₃ perovskite oxide datasets. In the dataset used, the direct band gap class constitutes approximately 84% of the samples, while the indirect class represents only 16%, leading conventional classification models to become biased toward the majority class. To address this issue, SMOTE was employed to balance the class distribution, and its performance was evaluated using several machine learning algorithms, including Multi-Layer Perceptron (MLP), Extra Trees, CatBoost, and Gradient Boosting. Model performance was assessed using 5-fold stratified cross-validation, with particular emphasis on F1-macro and recall metrics to ensure adequate evaluation of the minority class. The results show that although SMOTE did not significantly improve overall accuracy (baseline: 0.89; SMOTE: 0.88), it enhanced the models’ ability to recognize the minority class. Notable improvements in F1-macro were observed, increasing from 0.76 to 0.78 for MLP and from 0.75 to 0.78 for CatBoost. These findings highlight the importance of using F1-macro as a more informative evaluation metric than accuracy for imbalanced datasets and provide methodological insights for developing more robust predictive models in materials informatics.

Keywords:

SMOTE Class Imbalance Perovskite Oxide ABO3 band gap classification Direct vs Indirect

References

Akrom, M., Rustad, S., & Dipojono, H. K. (2024). A machine learning approach to predict the efficiency of corrosion inhibition by natural product-based organic inhibitors. Physica Scripta, 99(3), 36006. https://doi.org/10.1088/1402-4896/ad28a9

Assegie, T. A., Elanangai, V., Paulraj, J. S., Velmurugan, M., & Devesan, D. F. (2023). Evaluation of feature scaling for improving the performance of supervised learning methods. Bulletin of Electrical Engineering and Informatics, 12(3), 1833–1838. https://doi.org/10.11591/eei.v12i3.5170

Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. https://doi.org/10.1016/j.neunet.2018.07.011

Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Dawa, T., & Sajjadi, B. (2024). Exploring the potential of perovskite structures for chemical looping technology: A state-of-the-art review. Fuel Processing Technology, 253, 108022. https://doi.org/10.1016/j.fuproc.2023.108022

Demircioğlu, A. (2024). Applying oversampling before cross-validation will lead to high bias in radiomics. Scientific Reports, 14(1), 11563. https://doi.org/10.1038/s41598-024-62585-z

Elreedy, D., Atiya, A. F., & Kamalov, F. (2024). A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Machine Learning, 113(7), 4903–4923. https://doi.org/10.1007/s10994-022-06296-4

Farhadpour, S., Warner, T. A., & Maxwell, A. E. (2024). Selecting and Interpreting Multiclass Loss and Accuracy Assessment Metrics for Classifications with Class Imbalance: Guidance and Best Practices. Remote Sensing, 16(3), 533. https://doi.org/10.3390/rs16030533

Hoye, R. L. Z., Hidalgo, J., Jagt, R. A., Correa‐Baena, J., Fix, T., & MacManus‐Driscoll, J. L. (2022). The Role of Dimensionality on the Optoelectronic Properties of Oxide and Halide Perovskites, and their Halide Derivatives. Advanced Energy Materials, 12(4). https://doi.org/10.1002/aenm.202100499

Joloudari, J. H., Marefat, A., Nematollahi, M. A., Oyelere, S. S., & Hussain, S. (2023). Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks. Applied Sciences, 13(6), 4006. https://doi.org/10.3390/app13064006

Kim, B., & Kim, J. (2020). Adjusting Decision Boundary for Class Imbalanced Learning. IEEE Access, 8, 81674–81685. https://doi.org/10.1109/ACCESS.2020.2991231

Kim, D., Oh, L. S., Park, J. H., Kim, H. J., Lee, S., & Lim, E. (2022). Perovskite-based electrocatalysts for oxygen evolution reaction in alkaline media: A mini review. Frontiers in Chemistry, 10. https://doi.org/10.3389/fchem.2022.1024865

Liu, D., Zhou, P., Bai, H., Ai, H., Du, X., Chen, M., Liu, D., Ip, W. F., Lo, K. H., Kwok, C. T., Chen, S., Wang, S., Xing, G., Wang, X., & Pan, H. (2021). Development of Perovskite Oxide‐Based Electrocatalysts for Oxygen Evolution Reaction. Small, 17(43). https://doi.org/10.1002/smll.202101605

Mujahid, M., Kına, E., Rustam, F., Villar, M. G., Alvarado, E. S., Diez, I. D. L. T., & Ashraf, I. (2024). Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering. Journal of Big Data, 11(1), 87. https://doi.org/10.1186/s40537-024-00943-4

Mukherjee, M., & Khushi, M. (2021). SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features. Applied System Innovation, 4(1), 18. https://doi.org/10.3390/asi4010018

Owusu-Adjei, M., Hayfron-Acquah, J. Ben, Frimpong, T., & Abdul-Salaam, G. (2023). Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems. PLOS Digital Health, 2(11), e0000290. https://doi.org/10.1371/journal.pdig.0000290

Park, H.-J., Koo, Y.-S., Yang, H.-Y., Han, Y.-S., & Nam, C.-S. (2024). Study on Data Preprocessing for Machine Learning Based on Semiconductor Manufacturing Processes. Sensors, 24(17), 5461. https://doi.org/10.3390/s24175461

Rahman, I. F., Azies, H. Al, & Akrom, M. (2025). Deteksi Struktur Material Perovskit ABO3 Berbasis Machine Learning. Jurnal Pendidikan Fisika Dan Teknologi, 9(1), 2025. https://doi.org/10.47002/metik.v9i1.1036

Ramadhan, N. G. (2025). Enhancing SMOTE Using Euclidean Weighting for Imbalanced Classification Dataset. Journal of Applied Data Sciences, 6(3), 2207–2220. https://doi.org/10.47738/jads.v6i3.798

Shi, J., Zhang, J., Yang, L., Qu, M., Qi, D., & Zhang, K. H. L. (2021). Wide Bandgap Oxide Semiconductors: from Materials Physics to Optoelectronic Devices. Advanced Materials, 33(50). https://doi.org/10.1002/adma.202006230

Sudha, P. G., Mattur, M. N., Nagappan, N., Rath, S., & Thomas, T. (2022). Prediction of nature of band gap of perovskite oxides ( ABO 3 ) using a machine learning approach. Journal of Materiomics, 8(5), 937–948. https://doi.org/10.1016/j.jmat.2022.04.006

Szeghalmy, S., & Fazekas, A. (2023). A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. Sensors, 23(4), 2333. https://doi.org/10.3390/s23042333

Wang, T., Tan, X., Wei, Y., & Jin, H. (2021). Accurate bandgap predictions of solids assisted by machine learning. Materials Today Communications, 29, 102932. https://doi.org/10.1016/j.mtcomm.2021.102932

Xu, P., Ji, X., Li, M., & Lu, W. (2023). Small data machine learning in materials science. Npj Computational Materials, 9(1), 42. https://doi.org/10.1038/s41524-023-01000-z

Zhang, J., Li, Y., & Zhou, X. (2023). Machine-Learning Prediction of the Computed Band Gaps of Double Perovskite Materials. Computer Science and Machine Learning Trends 2023, 15–27. https://doi.org/10.5121/csit.2023.130102

Zhao, J., Wang, X., Li, H., & Xu, X. (2024). Interpretable machine learning-assisted screening of perovskite oxides. RSC Advances, 14(6), 3909–3922. https://doi.org/10.1039/D3RA08591K

Author Biographies

Desvita Maharani, Dian Nuswantoro University

Author Origin : Indonesia

Johana Oktavia Ramadhani, Dian Nuswantoro University

Author Origin : Indonesia

Aliyah Zahratu Rizqi, Dian Nuswantoro University

Author Origin : Indonesia

Muhamad Akrom, Dian Nuswantoro University

Author Origin : Indonesia

Downloads

Download data is not yet available.

How to Cite

Maharani, D., Ramadhani, J. O., Rizqi, A. Z., & Akrom, M. (2026). Impact of SMOTE Oversampling on Classifying Band Gap Types in Imbalanced ABO₃ Perovskite Oxides. Jurnal Pendidikan Fisika Dan Teknologi, 12(1). https://doi.org/10.29303/jpft.v12i1.11479