TY - JOUR
T1 - Pulling the carpet below the learner's feet
T2 - Genetic algorithm to learn ensemble machine learning model during concept drift
AU - Lazebnik, Teddy
N1 - Publisher Copyright:
© 2025 The Author
PY - 2025/7/15
Y1 - 2025/7/15
N2 - Data-driven models, in general, and machine learning (ML) models, in particular, have gained popularity over recent years with an increased usage of such models across the scientific and engineering domains. When using ML models in realistic and dynamic environments, users often need to handle the challenge of concept drift (CD). In this study, we explore the application of genetic algorithms (GAs) to address the challenges posed by CD in such settings. Formally, we propose a novel two-level ensemble ML model, which combines a global ML model with a CD detector, operating as an aggregator for a population of ML pipeline models, each one with an adjusted CD detector by itself responsible for re-training its ML model. In addition, we show that one can further improve the proposed model by utilizing off-the-shelf automatic ML (AutoML) methods. Through extensive synthetic dataset analysis, we show that the proposed model statistically significantly outperforms an ML pipeline with a CD algorithm, particularly in scenarios with unknown CD characteristics or a mixture of moving and shifting CDs. Moreover, we show a sub-linear decline in the proposed method's performance with respect to a higher drifting rate and robustness to the underlying AutoML method utilized.
AB - Data-driven models, in general, and machine learning (ML) models, in particular, have gained popularity over recent years with an increased usage of such models across the scientific and engineering domains. When using ML models in realistic and dynamic environments, users often need to handle the challenge of concept drift (CD). In this study, we explore the application of genetic algorithms (GAs) to address the challenges posed by CD in such settings. Formally, we propose a novel two-level ensemble ML model, which combines a global ML model with a CD detector, operating as an aggregator for a population of ML pipeline models, each one with an adjusted CD detector by itself responsible for re-training its ML model. In addition, we show that one can further improve the proposed model by utilizing off-the-shelf automatic ML (AutoML) methods. Through extensive synthetic dataset analysis, we show that the proposed model statistically significantly outperforms an ML pipeline with a CD algorithm, particularly in scenarios with unknown CD characteristics or a mixture of moving and shifting CDs. Moreover, we show a sub-linear decline in the proposed method's performance with respect to a higher drifting rate and robustness to the underlying AutoML method utilized.
KW - Automatic machine learning
KW - Concept drift
KW - Ensemble machine learning
KW - Heuristic optimization
UR - http://www.scopus.com/inward/record.url?scp=105002650916&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2025.110772
DO - 10.1016/j.engappai.2025.110772
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:105002650916
SN - 0952-1976
VL - 152
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 110772
ER -