Time-Series Forecasting of Urban Air Pollutant Levels Using Deep Learning Models

Archive

Research Article

Volume 7 Issue 4

Time-Series Forecasting of Urban Air Pollutant Levels Using Deep Learning Models

Husein Harun, Michael Nsor, Edidiong Elijah Akpan, Udodirim Ogwo-Ude, Gloria Nwachukwu Ogochukwu, Bokolo Wanengimorte George* and Josiah Nii Armah Tettey

September 23, 2025

DOI : 10.56831/PSEN-07-225

View PDF

Abstract

Accurate forecasting of urban air pollutants is critical for public health management and regulatory compliance. While machine learning models are widely used, their performance is highly dependent on the robustness of the underlying methodological framework. This study proposes and validates a comprehensive, end-to-end framework for forecasting hourly Carbon Monoxide (CO) concentrations using the canonical UCI Air Quality dataset. The methodology integrates context-aware data imputation, an advanced feature engineering pipeline (incorporating temporal, cyclical, autoregressive, and rolling-window features), and a rigorous comparative evaluation of fourteen distinct machine learning models. Crucially, all models are validated using a 5-fold TimeSeriesSplit cross-validation protocol to ensure temporal data integrity and prevent lookahead bias. The results demonstrate the clear superiority of ensemble methods, with an optimized XGBoost model emerging as the top performer, achieving an R-squared score of 0.9216 and a Root Mean Squared Error of 0.3824 mg/m³. Feature importance analysis revealed that the model’s predictions were overwhelmingly driven by non-methanic hydrocarbon (NMHC) sensor readings and their engineered non-linear terms, confirming the model learned scientifically sound relationships. This study validates a synergistic framework where advanced feature engineering paired with powerful ensemble models provides a new benchmark for accuracy and offers a viable template for developing reliable, real-world air quality forecasting systems.

Keywords: Air Quality Forecasting; Time-Series Analysis; Machine Learning; Feature Engineering; Gradient Boosting; Ensemble Methods; Pollutant Prediction

References

Manisalidis I., et al. “Environmental and health impacts of air pollution: A review”. Frontiers in Public Health 8.14 (2020): 1-13.
World Health Organization, “WHO global air quality guidelines: particulate matter (‎PM2.5 and PM10)‎, ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide”. www.who.int (2021). https://www.who.int/publications/i/item/9789240034228
H Liu., et al. “Intelligent modeling strategies for forecasting air quality time series: A review”. Applied Soft Computing 102 (2021): 106957-106957.
L Bai., et al. “Air Pollution Forecasts: An Overview”. International Journal of Environmental Research and Public Health 15.4 (2018).
G Box. “Box and Jenkins: Time Series Analysis, Forecasting and Control”. A Very British Affair (2013): 161-215.
WKA Wan Ahmad and S Ahmad. “Arima model and exponential smoothing method: A comparison”. AIP Conference Proceedings (2013).
U Kumar and VK Jain. “ARIMA forecasting of ambient air pollutants (O3, NO, NO2 and CO)”. Stochastic Environmental Research and Risk Assessment 24.5 (2009): 751-760.
J Smola and B Schölkopf. “A tutorial on support vector regression”. Statistics and Computing 14.3 (2004): 199-222.
J Zhang., et al. “Support Vector Machine Modeling Using Particle Swarm Optimization Approach for the Retrieval of Atmospheric Ammonia Concentrations”. Environmental Modeling & Assessment 21.4 (2015): 531-546.
MW Gardner and SR Dorling. “Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences”. Atmospheric Environment 32.14-15 (1998): 2627-2636.
SR Shams., et al. “The evaluation on artificial neural networks (ANN) and multiple linear regressions (MLR) models for predicting SO2 concentration”. Urban Climate 37 (2021): 100837.
KMK Yusof., et al. “The evaluation on artificial neural networks (ANN) and multiple linear regressions (MLR) models over particulate matter (PM10) variability during haze and non-haze episodes: A decade case study”. Malaysian Journal of Fundamental and Applied Sciences 15.2 (2019): 164-172.
C Bergmeir, RJ Hyndman and B Koo. “A note on the validity of cross-validation for evaluating autoregressive time series prediction”. Computational Statistics & Data Analysis 120 (2018): 70-83.
S Vito. “Air Quality”. UCI Machine Learning Repository (2008).
B Zhai and J-G Chen. “Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China”. Science of The Total Environment 635 (2018): 644-658.
M Zamani Joharestani., et al. “PM2.5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data”. Atmosphere 10.7 (2019): 373.
T Chen and C Guestrin. “XGBoost: a Scalable Tree Boosting System”. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 1.1 (2016): 785-794.
United States Environmental Protection Agency, “Basic Information about Carbon Monoxide (CO) Outdoor Air Pollution | US EPA”. US EPA (2016). https://www.epa.gov/co-pollution/basic-information-about-carbon-monoxide-co-outdoor-air-pollution
H Seinfeld and SN Pandis. “Atmospheric chemistry and physics : from air pollution to climate change, 3rd ed”. Hoboken, Nj: Wiley (2016).