TY - GEN
T1 - Robust Malicious Domain Detection
AU - Hason, Nitay
AU - Dvir, Amit
AU - Hajaj, Chen
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Malicious domains are increasingly common and pose a severe cybersecurity threat. Specifically, many types of current cyber attacks use URLs for attack communications (e.g., C&C, phishing, and spear-phishing). Despite the continuous progress in detecting these attacks, many alarming problems remain open, such as the weak spots of the defense mechanisms. Because ML has become one of the most prominent methods of malware detection, we propose a robust feature selection mechanism that results in malicious domain detection models that are resistant to black-box evasion attacks. This paper makes two main contributions. Our mechanism exhibits high performance based on data collected from ~5000 benign active URLs and ~1350 malicious active (attacks) URLs. We also provide an analysis of robust feature selection based on widely used features in the literature. Note that even though we cut the feature set dimensional space in half (from nine to four features), we still improve the performance of the classifier (an increase in the model’s F1-score from 92.92% to 95.81%). The fact that our models are robust to malicious perturbations but are also useful for clean data demonstrates the effectiveness of constructing a model that is solely trained on robust features.
AB - Malicious domains are increasingly common and pose a severe cybersecurity threat. Specifically, many types of current cyber attacks use URLs for attack communications (e.g., C&C, phishing, and spear-phishing). Despite the continuous progress in detecting these attacks, many alarming problems remain open, such as the weak spots of the defense mechanisms. Because ML has become one of the most prominent methods of malware detection, we propose a robust feature selection mechanism that results in malicious domain detection models that are resistant to black-box evasion attacks. This paper makes two main contributions. Our mechanism exhibits high performance based on data collected from ~5000 benign active URLs and ~1350 malicious active (attacks) URLs. We also provide an analysis of robust feature selection based on widely used features in the literature. Note that even though we cut the feature set dimensional space in half (from nine to four features), we still improve the performance of the classifier (an increase in the model’s F1-score from 92.92% to 95.81%). The fact that our models are robust to malicious perturbations but are also useful for clean data demonstrates the effectiveness of constructing a model that is solely trained on robust features.
KW - Domain
KW - Malware detection
KW - Robust features
UR - http://www.scopus.com/inward/record.url?scp=85087775961&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-49785-9_4
DO - 10.1007/978-3-030-49785-9_4
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85087775961
SN - 9783030497842
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 45
EP - 61
BT - Cyber Security Cryptography and Machine Learning - 4th International Symposium, CSCML 2020, Proceedings
A2 - Dolev, Shlomi
A2 - Weiss, Gera
A2 - Kolesnikov, Vladimir
A2 - Lodha, Sachin
T2 - 4th International Symposium on Cyber Security Cryptography and Machine Learning, CSCML 2020
Y2 - 2 July 2020 through 3 July 2020
ER -