TY - GEN
T1 - Optimized File Type Detection and One-Shot Retrieval
AU - Lisker, Simona
AU - Butman, Ayelet
AU - Hajaj, Chen
AU - Dubin, Ran
AU - Dvir, Amit
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - File type classification is critical in digital forensics, and file carving. However, the increasing diversity of file formats challenges accurate classification. Traditional methods rely on hand-crafted features or compact neural networks but face long training times, limited training data, and lower accuracy. This paper introduces three novel, content-based file-type classification approaches to address these challenges. These approaches improve accuracy and streamline the integration of new file types using pre-trained models, enhancing both speed and reliability. The first approach utilizes Natural Language Processing (NLP) with a transformer architecture, while the second combines statistical features with a pre-trained model via transfer learning. These methods achieved accuracy rates of 72.4% and 69.2%, respectively, surpassing state-of-the-art Convolutional Neural Network (CNN) models. The third approach employs one-shot learning, achieving 100% accuracy in several scenarios, enabling efficient training with minimal data.
AB - File type classification is critical in digital forensics, and file carving. However, the increasing diversity of file formats challenges accurate classification. Traditional methods rely on hand-crafted features or compact neural networks but face long training times, limited training data, and lower accuracy. This paper introduces three novel, content-based file-type classification approaches to address these challenges. These approaches improve accuracy and streamline the integration of new file types using pre-trained models, enhancing both speed and reliability. The first approach utilizes Natural Language Processing (NLP) with a transformer architecture, while the second combines statistical features with a pre-trained model via transfer learning. These methods achieved accuracy rates of 72.4% and 69.2%, respectively, surpassing state-of-the-art Convolutional Neural Network (CNN) models. The third approach employs one-shot learning, achieving 100% accuracy in several scenarios, enabling efficient training with minimal data.
KW - digital forensics
KW - few-shot
KW - file type
KW - malware
KW - one-shot
UR - https://www.scopus.com/pages/publications/105018451961
U2 - 10.1109/ICC52391.2025.11161523
DO - 10.1109/ICC52391.2025.11161523
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:105018451961
T3 - IEEE International Conference on Communications
SP - 1121
EP - 1126
BT - ICC 2025 - IEEE International Conference on Communications
A2 - Valenti, Matthew
A2 - Reed, David
A2 - Torres, Melissa
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Conference on Communications, ICC 2025
Y2 - 8 June 2025 through 12 June 2025
ER -