Hybrid Speech and Text Analysis Methods for Speaker Change Detection

Or Haim Anidjar, Itshak Lapidot, Chen Hajaj, Amit Dvir, Issachar Gilad

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Speaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. Nowadays, many applications, such as Speaker Diarization (SD) or automatic vocal transcription, depend on this segmentation task. In this paper, we focus on the essential task of the SD problem, the audio segmenting process, and suggest a solution for the SCD problem, as well as the assignment of clustered speaker labels for the extracted segments, and applying the solution over two datasets: a commercial dataset in Hebrew and the ICSI Meeting Corpus. As such, we propose a hybrid framework for the SCD problem that is learned by textual information and speech signals and the meta-data features that can be extracted from them. Moreover, we demonstrate the negative correlation between an increase in the number of speakers in the training dataset and the influence on the overall diarization system's performance, which is improved using our efficient SCD component. Finally, we show how our proposed hybrid framework remains robust compared to the ICSI Meeting Corpus, as the experimental evaluation's training and testing is based on two languages.

Original languageEnglish
Article number9468954
Pages (from-to)2324-2338
Number of pages15
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume29
DOIs
StatePublished - 2021

Keywords

  • D-vectors
  • Speaker change detection
  • speaker diarization
  • speaker verification
  • speech analysis

Fingerprint

Dive into the research topics of 'Hybrid Speech and Text Analysis Methods for Speaker Change Detection'. Together they form a unique fingerprint.

Cite this