Khan, Usman T.Snieder, Everett Joshua2024-11-072024-11-072024-08-092024-11-07https://hdl.handle.net/10315/42516Floods constitute a major threat to populations and infrastructure. Flood frequency and severity are projected to increase due to factors such as climate change and urbanisation. Flood early warning systems (FEWS), which rely on models that predict streamflow, provide relevant groups (e.g., transportation authorities, police, schools, etc.) with advance notice of flood risk, allowing them to respond early and minimise flood damage. However, Canada has inadequate flood forecasting infrastructure and lacks a national system. To address the projected flood risk, there is a crucial need to improve modelling capabilities in Canadian watersheds. The following manuscript-based dissertation presents a series of case studies that propose methodological improvements to hydrological modelling frameworks, focusing on the selection of training data for flood forecasting studies. The studies also demonstrate applications of machine learning (ML) to improve model accuracy and reliability. These studies span a wide range of spatiotemporal conditions, ranging from forecasting flash-flooding at sub-hourly frequencies in small urban watersheds, to nationwide, multi-day forecasts. Each of the four studies exploits the benefits of hydrologically diverse training data for improving model performance as FEWS. The first manuscript introduces a novel ensemble model framework for traditional, physics-based urban stormwater models. The method leverages the concept of equifinality to generate ensembles with diverse parameter value estimates, which are shown to outperform traditionally calibrated models. The second and third manuscripts evaluate pure ML-based ensembles. The second manuscript simplifies flood forecasting by framing it as a binary problem. The Synthetic Minority Oversampling TEchnique (SMOTE) algorithm is applied to increase the proportion of flood samples in the dataset, which is shown to improve flood forecast accuracy at the expense of an increased number of false positives. Next, three ensemble algorithms and five classifiers are systematically compared; extreme learning machines (ELMs) and support vector machines (SVMs) are found to be the strongest classifiers. The third manuscript proposes an improvement to popular ensemble algorithms, which consists of embedding synthetic oversampling within the ensemble loop, to increase the covariance of ensemble member predictions. The fourth manuscript evaluates a novel cluster-based training data selection framework for regionally trained deep learning models. The method is used to show how data from hydrologically dissimilar basins is more useful to improving performance in a target basin, compared to data from the target basin itself, or proximal basins. Collectively, the results of these studies advance techniques for achieving improvements in model performance, especially during high streamflow conditions, which are most important to flood warning systems.Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.Hydrologic sciencesArtificial intelligenceCivil engineeringTowards an improved understanding of the importance hydrologically diverse data for training flood forecasting modelsElectronic Thesis or Dissertation2024-11-07Flood forecastingRainfall-runoffMachine learningLSTMClusteringDiversityHydrologyFlow forecastingFlood early warning systemsArtificial neural networksEnsemblesPeak flow prediction