TY - JOUR
T1 - A blood test-based machine learning model for predicting lung cancer risk
AU - Schwartz, Lihi
AU - Matania, Naor
AU - Levi, Matanel
AU - Kushnir, Shiri
AU - Yosef, Noga
AU - Hoogi, Assaf
AU - Lazebnik, Teddy
AU - Shlomi, Dekel
N1 - Publisher Copyright:
Copyright © 2025 Schwartz, Matania, Levi, Kushnir, Yosef, Hoogi, Lazebnik and Shlomi.
PY - 2025
Y1 - 2025
N2 - Background: The goal of early detection is individual cancer prediction. For lung cancer (LC), age and smoking history are the primary criteria for annual low-dose CT screening, leaving other populations at risk of being overlooked. Machine learning (ML) is a promising method to identify complex patterns in the data that can reveal personalized disease predictors. Methods: An ML-based model was used on blood test data collected before the diagnosis of LC, and sociodemographic factors such as age and gender among LC patients and controls were incorporated to predict the risk for future LC diagnosis. Results: In addition to age and gender, we identified 22 blood tests that contributed to the model. For the entire study population, the ML model predicted LC with an accuracy of 71.2%, a sensitivity of 63%, and a positive predictive value of 67.2%. Higher accuracy was found among women than men (71.8 vs. 70.8) and among never smokers than smokers (73.6 vs. 70.1%). Age was the most significant contributor (13.6%), followed by red blood cell distribution (5.1%), creatinine (5%), gender (3.6%), and mean corpuscular hemoglobin (3.3%). A majority of the blood tests made a highly variable contribution to the complex ML model; however, some tests, such as red cell distribution width, mean corpuscular hemoglobin, prothrombin time, hematocrit, urea, and calcium, contributed slightly more to a dichotomous prediction. Conclusion: Blood tests can be used in the proposed ML model to predict LC. More studies are needed in basic science fields to identify possible explanations between specific blood results and LC prediction.
AB - Background: The goal of early detection is individual cancer prediction. For lung cancer (LC), age and smoking history are the primary criteria for annual low-dose CT screening, leaving other populations at risk of being overlooked. Machine learning (ML) is a promising method to identify complex patterns in the data that can reveal personalized disease predictors. Methods: An ML-based model was used on blood test data collected before the diagnosis of LC, and sociodemographic factors such as age and gender among LC patients and controls were incorporated to predict the risk for future LC diagnosis. Results: In addition to age and gender, we identified 22 blood tests that contributed to the model. For the entire study population, the ML model predicted LC with an accuracy of 71.2%, a sensitivity of 63%, and a positive predictive value of 67.2%. Higher accuracy was found among women than men (71.8 vs. 70.8) and among never smokers than smokers (73.6 vs. 70.1%). Age was the most significant contributor (13.6%), followed by red blood cell distribution (5.1%), creatinine (5%), gender (3.6%), and mean corpuscular hemoglobin (3.3%). A majority of the blood tests made a highly variable contribution to the complex ML model; however, some tests, such as red cell distribution width, mean corpuscular hemoglobin, prothrombin time, hematocrit, urea, and calcium, contributed slightly more to a dichotomous prediction. Conclusion: Blood tests can be used in the proposed ML model to predict LC. More studies are needed in basic science fields to identify possible explanations between specific blood results and LC prediction.
KW - artificial intelligence
KW - blood test
KW - lung cancer
KW - machine learning
KW - prediction model
UR - https://www.scopus.com/pages/publications/105009706217
U2 - 10.3389/fmed.2025.1577451
DO - 10.3389/fmed.2025.1577451
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:105009706217
SN - 2296-858X
VL - 12
JO - Frontiers in Medicine
JF - Frontiers in Medicine
M1 - 1577451
ER -