A thousand words are worth more than one recording: Word-embedding based speaker change detection

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Speaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. This task is essential for many applications, such as automatic voice transcription or Speaker Diarization (SD). This paper focuses on the essential task of audio segmentation and suggests a word-embedding-based solution for the SCD problem. Moreover, we show how to use our approach in order to outperform voice-based solutions for the SD problem. We empirically show that our method can accurately identify the speaker-turns in an audio-recording with 82.12% and 89.02% success in the Recall and F1-score measures.

Original languageEnglish
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Pages3121-3125
Number of pages5
ISBN (Electronic)9781713836902
DOIs
StatePublished - 2021
Event22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
Duration: 30 Aug 20213 Sep 2021

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume4
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/TerritoryCzech Republic
CityBrno
Period30/08/213/09/21

Keywords

  • Clustering
  • Speaker change detection
  • Speaker diarization
  • Speech recognition
  • Word embedding

Fingerprint

Dive into the research topics of 'A thousand words are worth more than one recording: Word-embedding based speaker change detection'. Together they form a unique fingerprint.

Cite this