TY - GEN
T1 - Clustering the Unknown - The Youtube Case
AU - Dvir, Amit
AU - Marnerides, Angelos K.
AU - Dubin, Ran
AU - Golan, Nehor
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/4/8
Y1 - 2019/4/8
N2 - Recent stringent end-user security and privacy requirements caused the dramatic rise of encrypted video streams in which YouTube encrypted traffic is one of the most prevalent. Regardless of their encrypted nature, metadata derived from such traffic flows can be utilized to identify the title of a video, thus enabling the classification of video streams into a single video title using a given video title set. Nonetheless, scenarios where no video title set is present and a supervised approach is not feasible, are both frequent and challenging. In this paper we go beyond previous studies and demonstrate the feasibility of clustering unknown video streams into subgroups although no information is available about the title name. We address this problem by exploring Natural Language Processing (NLP) formulations and Word2vec techniques to compose a novel statistical feature in order to further cluster unknown video streams. Through our experimental results over real datasets we demonstrate that our methodology is capable to cluster 72 video titles out of 100 video titles from a dataset of 10,000 video streams. Thus, we argue that the proposed methodology could sufficiently contribute to the newly rising and demanding domain of encrypted Internet traffic classification.
AB - Recent stringent end-user security and privacy requirements caused the dramatic rise of encrypted video streams in which YouTube encrypted traffic is one of the most prevalent. Regardless of their encrypted nature, metadata derived from such traffic flows can be utilized to identify the title of a video, thus enabling the classification of video streams into a single video title using a given video title set. Nonetheless, scenarios where no video title set is present and a supervised approach is not feasible, are both frequent and challenging. In this paper we go beyond previous studies and demonstrate the feasibility of clustering unknown video streams into subgroups although no information is available about the title name. We address this problem by exploring Natural Language Processing (NLP) formulations and Word2vec techniques to compose a novel statistical feature in order to further cluster unknown video streams. Through our experimental results over real datasets we demonstrate that our methodology is capable to cluster 72 video titles out of 100 video titles from a dataset of 10,000 video streams. Thus, we argue that the proposed methodology could sufficiently contribute to the newly rising and demanding domain of encrypted Internet traffic classification.
KW - Clustering
KW - Encrypted Traffic
KW - Video Title
UR - http://www.scopus.com/inward/record.url?scp=85064982836&partnerID=8YFLogxK
U2 - 10.1109/ICCNC.2019.8685364
DO - 10.1109/ICCNC.2019.8685364
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85064982836
T3 - 2019 International Conference on Computing, Networking and Communications, ICNC 2019
SP - 402
EP - 407
BT - 2019 International Conference on Computing, Networking and Communications, ICNC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 International Conference on Computing, Networking and Communications, ICNC 2019
Y2 - 18 February 2019 through 21 February 2019
ER -