Optimized File Type Detection and One-Shot Retrieval

Simona Lisker, Ayelet Butman, Chen Hajaj, Ran Dubin, Amit Dvir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

File type classification is critical in digital forensics, and file carving. However, the increasing diversity of file formats challenges accurate classification. Traditional methods rely on hand-crafted features or compact neural networks but face long training times, limited training data, and lower accuracy. This paper introduces three novel, content-based file-type classification approaches to address these challenges. These approaches improve accuracy and streamline the integration of new file types using pre-trained models, enhancing both speed and reliability. The first approach utilizes Natural Language Processing (NLP) with a transformer architecture, while the second combines statistical features with a pre-trained model via transfer learning. These methods achieved accuracy rates of 72.4% and 69.2%, respectively, surpassing state-of-the-art Convolutional Neural Network (CNN) models. The third approach employs one-shot learning, achieving 100% accuracy in several scenarios, enabling efficient training with minimal data.

Original languageEnglish
Title of host publicationICC 2025 - IEEE International Conference on Communications
EditorsMatthew Valenti, David Reed, Melissa Torres
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1121-1126
Number of pages6
ISBN (Electronic)9798331505219
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Communications, ICC 2025 - Montreal, Canada
Duration: 8 Jun 202512 Jun 2025

Publication series

NameIEEE International Conference on Communications
ISSN (Print)1550-3607

Conference

Conference2025 IEEE International Conference on Communications, ICC 2025
Country/TerritoryCanada
CityMontreal
Period8/06/2512/06/25

Keywords

  • digital forensics
  • few-shot
  • file type
  • malware
  • one-shot

Fingerprint

Dive into the research topics of 'Optimized File Type Detection and One-Shot Retrieval'. Together they form a unique fingerprint.

Cite this