TY - JOUR
T1 - Hybrid Speech and Text Analysis Methods for Speaker Change Detection
AU - Anidjar, Or Haim
AU - Lapidot, Itshak
AU - Hajaj, Chen
AU - Dvir, Amit
AU - Gilad, Issachar
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2021
Y1 - 2021
N2 - Speaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. Nowadays, many applications, such as Speaker Diarization (SD) or automatic vocal transcription, depend on this segmentation task. In this paper, we focus on the essential task of the SD problem, the audio segmenting process, and suggest a solution for the SCD problem, as well as the assignment of clustered speaker labels for the extracted segments, and applying the solution over two datasets: a commercial dataset in Hebrew and the ICSI Meeting Corpus. As such, we propose a hybrid framework for the SCD problem that is learned by textual information and speech signals and the meta-data features that can be extracted from them. Moreover, we demonstrate the negative correlation between an increase in the number of speakers in the training dataset and the influence on the overall diarization system's performance, which is improved using our efficient SCD component. Finally, we show how our proposed hybrid framework remains robust compared to the ICSI Meeting Corpus, as the experimental evaluation's training and testing is based on two languages.
AB - Speaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. Nowadays, many applications, such as Speaker Diarization (SD) or automatic vocal transcription, depend on this segmentation task. In this paper, we focus on the essential task of the SD problem, the audio segmenting process, and suggest a solution for the SCD problem, as well as the assignment of clustered speaker labels for the extracted segments, and applying the solution over two datasets: a commercial dataset in Hebrew and the ICSI Meeting Corpus. As such, we propose a hybrid framework for the SCD problem that is learned by textual information and speech signals and the meta-data features that can be extracted from them. Moreover, we demonstrate the negative correlation between an increase in the number of speakers in the training dataset and the influence on the overall diarization system's performance, which is improved using our efficient SCD component. Finally, we show how our proposed hybrid framework remains robust compared to the ICSI Meeting Corpus, as the experimental evaluation's training and testing is based on two languages.
KW - D-vectors
KW - Speaker change detection
KW - speaker diarization
KW - speaker verification
KW - speech analysis
UR - http://www.scopus.com/inward/record.url?scp=85111160293&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2021.3093817
DO - 10.1109/TASLP.2021.3093817
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85111160293
SN - 2329-9290
VL - 29
SP - 2324
EP - 2338
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
M1 - 9468954
ER -