Impact of SMOTE Oversampling on Classifying Band Gap Types in Imbalanced ABO₃ Perovskite Oxides
DOI:
10.29303/jpft.v12i1.11479Published:
2026-04-30Downloads
Abstract
This study investigates the impact of the Synthetic Minority Over-sampling Technique (SMOTE) on the classification of direct and indirect band gap types in imbalanced ABO₃ perovskite oxide datasets. In the dataset used, the direct band gap class constitutes approximately 84% of the samples, while the indirect class represents only 16%, leading conventional classification models to become biased toward the majority class. To address this issue, SMOTE was employed to balance the class distribution, and its performance was evaluated using several machine learning algorithms, including Multi-Layer Perceptron (MLP), Extra Trees, CatBoost, and Gradient Boosting. Model performance was assessed using 5-fold stratified cross-validation, with particular emphasis on F1-macro and recall metrics to ensure adequate evaluation of the minority class. The results show that although SMOTE did not significantly improve overall accuracy (baseline: 0.89; SMOTE: 0.88), it enhanced the models’ ability to recognize the minority class. Notable improvements in F1-macro were observed, increasing from 0.76 to 0.78 for MLP and from 0.75 to 0.78 for CatBoost. These findings highlight the importance of using F1-macro as a more informative evaluation metric than accuracy for imbalanced datasets and provide methodological insights for developing more robust predictive models in materials informatics.
Keywords:
SMOTE Class Imbalance Perovskite Oxide ABO3 band gap classification Direct vs IndirectReferences
Akrom, M., Rustad, S., & Dipojono, H. K. (2024). A machine learning approach to predict the efficiency of corrosion inhibition by natural product-based organic inhibitors. Physica Scripta, 99(3), 36006. https://doi.org/10.1088/1402-4896/ad28a9
Assegie, T. A., Elanangai, V., Paulraj, J. S., Velmurugan, M., & Devesan, D. F. (2023). Evaluation of feature scaling for improving the performance of supervised learning methods. Bulletin of Electrical Engineering and Informatics, 12(3), 1833–1838. https://doi.org/10.11591/eei.v12i3.5170
Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. https://doi.org/10.1016/j.neunet.2018.07.011
Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Dawa, T., & Sajjadi, B. (2024). Exploring the potential of perovskite structures for chemical looping technology: A state-of-the-art review. Fuel Processing Technology, 253, 108022. https://doi.org/10.1016/j.fuproc.2023.108022
Demircioğlu, A. (2024). Applying oversampling before cross-validation will lead to high bias in radiomics. Scientific Reports, 14(1), 11563. https://doi.org/10.1038/s41598-024-62585-z
Elreedy, D., Atiya, A. F., & Kamalov, F. (2024). A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Machine Learning, 113(7), 4903–4923. https://doi.org/10.1007/s10994-022-06296-4
Farhadpour, S., Warner, T. A., & Maxwell, A. E. (2024). Selecting and Interpreting Multiclass Loss and Accuracy Assessment Metrics for Classifications with Class Imbalance: Guidance and Best Practices. Remote Sensing, 16(3), 533. https://doi.org/10.3390/rs16030533
Hoye, R. L. Z., Hidalgo, J., Jagt, R. A., Correa‐Baena, J., Fix, T., & MacManus‐Driscoll, J. L. (2022). The Role of Dimensionality on the Optoelectronic Properties of Oxide and Halide Perovskites, and their Halide Derivatives. Advanced Energy Materials, 12(4). https://doi.org/10.1002/aenm.202100499
Joloudari, J. H., Marefat, A., Nematollahi, M. A., Oyelere, S. S., & Hussain, S. (2023). Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks. Applied Sciences, 13(6), 4006. https://doi.org/10.3390/app13064006
Kim, B., & Kim, J. (2020). Adjusting Decision Boundary for Class Imbalanced Learning. IEEE Access, 8, 81674–81685. https://doi.org/10.1109/ACCESS.2020.2991231
Kim, D., Oh, L. S., Park, J. H., Kim, H. J., Lee, S., & Lim, E. (2022). Perovskite-based electrocatalysts for oxygen evolution reaction in alkaline media: A mini review. Frontiers in Chemistry, 10. https://doi.org/10.3389/fchem.2022.1024865
Liu, D., Zhou, P., Bai, H., Ai, H., Du, X., Chen, M., Liu, D., Ip, W. F., Lo, K. H., Kwok, C. T., Chen, S., Wang, S., Xing, G., Wang, X., & Pan, H. (2021). Development of Perovskite Oxide‐Based Electrocatalysts for Oxygen Evolution Reaction. Small, 17(43). https://doi.org/10.1002/smll.202101605
Mujahid, M., Kına, E., Rustam, F., Villar, M. G., Alvarado, E. S., Diez, I. D. L. T., & Ashraf, I. (2024). Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering. Journal of Big Data, 11(1), 87. https://doi.org/10.1186/s40537-024-00943-4
Mukherjee, M., & Khushi, M. (2021). SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features. Applied System Innovation, 4(1), 18. https://doi.org/10.3390/asi4010018
Owusu-Adjei, M., Hayfron-Acquah, J. Ben, Frimpong, T., & Abdul-Salaam, G. (2023). Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems. PLOS Digital Health, 2(11), e0000290. https://doi.org/10.1371/journal.pdig.0000290
Park, H.-J., Koo, Y.-S., Yang, H.-Y., Han, Y.-S., & Nam, C.-S. (2024). Study on Data Preprocessing for Machine Learning Based on Semiconductor Manufacturing Processes. Sensors, 24(17), 5461. https://doi.org/10.3390/s24175461
Rahman, I. F., Azies, H. Al, & Akrom, M. (2025). Deteksi Struktur Material Perovskit ABO3 Berbasis Machine Learning. Jurnal Pendidikan Fisika Dan Teknologi, 9(1), 2025. https://doi.org/10.47002/metik.v9i1.1036
Ramadhan, N. G. (2025). Enhancing SMOTE Using Euclidean Weighting for Imbalanced Classification Dataset. Journal of Applied Data Sciences, 6(3), 2207–2220. https://doi.org/10.47738/jads.v6i3.798
Shi, J., Zhang, J., Yang, L., Qu, M., Qi, D., & Zhang, K. H. L. (2021). Wide Bandgap Oxide Semiconductors: from Materials Physics to Optoelectronic Devices. Advanced Materials, 33(50). https://doi.org/10.1002/adma.202006230
Sudha, P. G., Mattur, M. N., Nagappan, N., Rath, S., & Thomas, T. (2022). Prediction of nature of band gap of perovskite oxides ( ABO 3 ) using a machine learning approach. Journal of Materiomics, 8(5), 937–948. https://doi.org/10.1016/j.jmat.2022.04.006
Szeghalmy, S., & Fazekas, A. (2023). A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. Sensors, 23(4), 2333. https://doi.org/10.3390/s23042333
Wang, T., Tan, X., Wei, Y., & Jin, H. (2021). Accurate bandgap predictions of solids assisted by machine learning. Materials Today Communications, 29, 102932. https://doi.org/10.1016/j.mtcomm.2021.102932
Xu, P., Ji, X., Li, M., & Lu, W. (2023). Small data machine learning in materials science. Npj Computational Materials, 9(1), 42. https://doi.org/10.1038/s41524-023-01000-z
Zhang, J., Li, Y., & Zhou, X. (2023). Machine-Learning Prediction of the Computed Band Gaps of Double Perovskite Materials. Computer Science and Machine Learning Trends 2023, 15–27. https://doi.org/10.5121/csit.2023.130102
Zhao, J., Wang, X., Li, H., & Xu, X. (2024). Interpretable machine learning-assisted screening of perovskite oxides. RSC Advances, 14(6), 3909–3922. https://doi.org/10.1039/D3RA08591K
License
Copyright (c) 2026 Desvita Maharani, Johana Oktavia Ramadhani, Aliyah Zahratu Rizqi, Muhamad Akrom

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with Jurnal Pendidikan Fisika dan Teknologi (JPFT) agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License 4.0 International License (CC-BY-SA License). This license allows authors to use all articles, data sets, graphics, and appendices in data mining applications, search engines, web sites, blogs, and other platforms by providing an appropriate reference. The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in Jurnal Pendidikan Fisika dan Teknologi (JPFT).
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

