TY - JOUR
T1 - The art of time-bending
T2 - Data augmentation and early prediction for efficient traffic classification
AU - Hajaj, Chen
AU - Aharon, Porat
AU - Dubin, Ran
AU - Dvir, Amit
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/10/15
Y1 - 2024/10/15
N2 - The accurate identification of internet traffic is crucial for network management. However, the use of encryption techniques and constant changes in network protocols make it difficult to extract useful features for traffic classification. Additionally, there may be limited data availability and a lack of diversity within the dataset, which poses further challenges. To address these issues, our research proposes a novel solution that uses an innovative data augmentation technique. This approach leverages the capabilities of LSTM networks to create synthetic data points that closely resemble real traffic data. By doing so, we can significantly enrich the dataset used for training and improve classification efficiency. We conducted thorough experiments to validate our approach and found that combining LSTM-generated data with actual traffic data leads to notable improvements in classification efficiency. We demonstrated the effectiveness of our methodology using academic and commercial datasets. Our classifier, trained on the generated data, showed a performance boost of 6%. Moreover, when classifying with only half of the time, thus utilizing half of the signal, our approach achieved a notable 4% improvement compared to the original classifier. The inclusion of augmented samples within the training set led to a noticeable improvement in both accuracy and F1-score. These findings compellingly demonstrate our data augmentation strategy's practical utility and efficiency in earlier prediction with improved performance for encrypted traffic classification systems.
AB - The accurate identification of internet traffic is crucial for network management. However, the use of encryption techniques and constant changes in network protocols make it difficult to extract useful features for traffic classification. Additionally, there may be limited data availability and a lack of diversity within the dataset, which poses further challenges. To address these issues, our research proposes a novel solution that uses an innovative data augmentation technique. This approach leverages the capabilities of LSTM networks to create synthetic data points that closely resemble real traffic data. By doing so, we can significantly enrich the dataset used for training and improve classification efficiency. We conducted thorough experiments to validate our approach and found that combining LSTM-generated data with actual traffic data leads to notable improvements in classification efficiency. We demonstrated the effectiveness of our methodology using academic and commercial datasets. Our classifier, trained on the generated data, showed a performance boost of 6%. Moreover, when classifying with only half of the time, thus utilizing half of the signal, our approach achieved a notable 4% improvement compared to the original classifier. The inclusion of augmented samples within the training set led to a noticeable improvement in both accuracy and F1-score. These findings compellingly demonstrate our data augmentation strategy's practical utility and efficiency in earlier prediction with improved performance for encrypted traffic classification systems.
KW - Data augmentation
KW - Internet traffic classification
KW - Long Short-Term Memory (LSTM) networks
UR - http://www.scopus.com/inward/record.url?scp=85192674926&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2024.124166
DO - 10.1016/j.eswa.2024.124166
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85192674926
SN - 0957-4174
VL - 252
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 124166
ER -