Tidal Flood Prediction in Surabaya Based on Hydrometeorological Data Using Gradient Boosting and Logistic Regression
DOI:
10.29303/jpm.v20i6.10068Published:
2025-10-15Issue:
Vol. 20 No. 6 (2025)Keywords:
Classification; Gradient Boosting; Hydrometeorology; Logistic Regression; Tindal FloodingArticles
Downloads
How to Cite
Downloads
Metrics
Abstract
This research aims to develop a predictive model for tidal inundation at Tanjung Perak Port in Surabaya, a region identified as critical and highly susceptible to such events. The foundational data incorporated comprises hydrometeorological indicators, such as lunar cycles, tidal patterns, and precipitation levels, which were sourced from the BMKG Tanjung Perak Maritime Meteorological Station. A dataset comprising 26,275 individual data points was compiled and subsequently partitioned into training sets (80% of the data) and validation sets (20%) via randomization. This apportionment is intended to support the robustness and applicability of the developed model. The initial data preparation phase involved techniques such as data normalization, imputation of missing values, and the determination of variable weights based on their respective degrees of impact. Subsequently, two distinct machine learning methodologies were employed to construct the predictive framework: Gradient Boosting (specifically, XGBoost) and Logistic Regression. The efficacy of the resultant models was rigorously assessed using various metrics, including accuracy, confusion matrix analysis, ROC-AUC scores, and feature significance analysis. Analysis of the outcomes indicated that the Gradient Boosting model achieved a superior accuracy of 99.96%, whereas Logistic Regression attained 99.85%. An examination of the features revealed that lunar cycles and tidal conditions were the principal determinants of tidal inundation, with precipitation exerting a comparatively minor effect. These observations substantiate the efficacy of integrating suitable data preparation techniques with machine learning methodologies to achieve precise predictive outcomes. The principal contribution of this investigation is the establishment of a computational framework to facilitate the development of an advanced warning system for tidal flooding, thereby aiding hazard reduction and limiting adverse societal, financial, and operational consequences in littoral regions.
References
N. Grubišić, T. Krljan, and K. Sesar, “Traffic microsimulation of the main junction connecting the urban road network with the sea-port container terminal,” Pomorstvo, vol. 37, no. 1, pp. 106–117, 2023, doi: 10.31217/p.37.1.9.
J. Sampurno, V. Vallaeys, R. Ardianto, and E. Hanert, “Integrated hydrodynamic and machine learning models for compound flooding prediction in a data-scarce estuarine delta,” Nonlinear Process. Geophys., vol. 29, no. 3, pp. 301–315, 2022, doi: 10.5194/npg-29-301-2022.
J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Ann. Stat., vol. 29, no. 5, pp. 1189–1232, 2001, doi: 10.1214/aos/1013203451.
C. L. Lopes et al., “Evaluation of future estuarine floods in a sea level rise context,” Sci. Rep., vol. 12, no. 1, pp. 1–15, 2022, doi: 10.1038/s41598-022-12122-7.
H. Daher et al., “Long‐Term Earth‐Moon Evolution With High‐Level Orbit and Ocean Tide Models,” J. Geophys. Res. Planets, vol. 126, no. 12, Dec. 2021, doi: 10.1029/2021JE006875.
Y. Wu, Z. Zhang, X. Qi, W. Hu, and S. Si, “Prediction of flood sensitivity based on Logistic Regression, eXtreme Gradient Boosting, and Random Forest modeling methods,” Water Sci. Technol., vol. 89, no. 10, pp. 2605–2624, 2024, doi: 10.2166/wst.2024.146.
C. Gde and L. Pringandana, “A Comparative Analysis of Hyperparameter-Tuned XGBoost and LightGBM for Multiclass Rainfall Classification in Jakarta,” vol. 6, no. 4, pp. 2467–2483, 2025.
M. L. Edamo, E. G. Ayele, T. Y. Ukumo, A. A. Kassaye, and A. P. Haile, “Capability of logistic regression in identifying flood-susceptible areas in a small watershed,” H2Open J., vol. 7, no. 5, pp. 351–374, 2024, doi: 10.2166/h2oj.2024.024.
D. Saepudin, E. S. Rabbani, D. Navialdy, and D. Adytia, “Water Level Rise Forecasting Using TCN Study Case in Surabaya Coastal Area,” J. Online Inform., vol. 9, no. 1, pp. 61–69, 2024, doi: 10.15575/join.v9i1.1312.
W. P. Waters, N. A. Ramaputra, A. S. Budiman, and W. A. Arifin, “Deep Learning for Tidal Flood cPrediction in,” vol. 10, no. 1, 2025.
K. Adillah, A. Sakti, L. Syahid, and K. Wikantika, “Assessing Tidal Flooding Vulnerability in the Coastal Region of Central Java Using Remote Sensing Approach,” 2024, doi: 10.4108/eai.24-11-2023.2346418.
L. L. Moreira, M. M. de Brito, and M. Kobiyama, “Effects of different normalization, aggregation, and classification methods on the construction of flood vulnerability indexes,” Water (Switzerland), vol. 13, no. 1, 2021, doi: 10.3390/w13010098.
B. Ma et al., “Comprehensive risk assessment of urban floods based on flood simulation and socio-economic vulnerability,” Front. Earth Sci., vol. 13, no. August, pp. 1–16, 2025, doi: 10.3389/feart.2025.1645693.
J. Zhang, W. Guo, S. W. Chang, D. D. Nguyen, and H. H. Ngo, “Data-Driven Innovations in Flood Hazard Assessment with Machine Learning,” 2025.
F. Salehi, E. Abbasi, and B. Hassibi, “The impact of regularization on high-dimensional logistic regression,” Adv. Neural Inf. Process. Syst., vol. 32, pp. 1–25, 2019.
Q. A. Hidayaturrohman and E. Hanada, “Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure,” BioMedInformatics, vol. 4, no. 4, pp. 2201–2212, 2024, doi: 10.3390/biomedinformatics4040118.
L. Pinheiro-Guedes, C. Martinho, and M. R. O. Martins, “Logistic Regression: Limitations in the Estimation of Measures of Association with Binary Health Outcomes,” Acta Med. Port., vol. 37, no. 10, pp. 697–705, 2024, doi: 10.20344/amp.21435.
N. R. Panda, J. K. Pati, J. N. Mohanty, and R. Bhuyan, “A Review on Logistic Regression in Medical Research,” Natl. J. Community Med., vol. 13, no. 4, pp. 265–270, 2022, doi: 10.55489/njcm.134202222.
H. Seto et al., “Gradient boosting decision tree becomes more reliable than logistic regression in predicting probability for diabetes with big data,” Sci. Rep., vol. 12, no. 1, pp. 1–10, 2022, doi: 10.1038/s41598-022-20149-z.
A. F. Militino, H. Goyena, U. Pérez-Goya, and M. D. Ugarte, “Logistic regression versus XGBoost for detecting burned areas using satellite images,” Environ. Ecol. Stat., vol. 31, no. 1, pp. 57–77, 2024, doi: 10.1007/s10651-023-00590-7.
Author Biographies
Kartika Dwi Indra Setyaningrum, 1Department of Physics, Universitas Negeri Surabaya
Kiki Syalasyatun Masfufah, Department of Physics, Universitas Negeri Surabaya
Endah Rahmawati, Department of Physics, Universitas Negeri Surabaya
Ady Hermanto, BMKG, Tanjung Perak Surabaya
License
Copyright (c) 2025 Kartika Dwi Indra Setyaningrum, Kiki Syalasyatun Masfufah, Endah Rahmawati, Ady Hermanto

This work is licensed under a Creative Commons Attribution 4.0 International License.
The following terms apply to authors who publish in this journal:
1. Authors retain copyright and grant the journal first publication rights, with the work simultaneously licensed under a Creative Commons Attribution License 4.0 International License (CC-BY License) that allows others to share the work with an acknowledgment of the work's authorship and first publication in this journal.
2. Authors may enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., posting it to an institutional repository or publishing it in a book), acknowledging its initial publication in this journal.
3. Before and during the submission process, authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website), as this can lead to productive exchanges as well as earlier and greater citation of published work (See The Effect of Open Access).