Speech and multilingual natural language framework for speaker change detection and diarization

Or Haim Anidjar, Yannick Estève, Chen Hajaj, Amit Dvir, Itshak Lapidot

פרסום מחקרי: פרסום בכתב עתמאמרביקורת עמיתים

4 ציטוטים ‏(Scopus)

תקציר

Speaker Change Detection (SCD) is the problem of splitting an audio-recording by its speaker-turns. Many real-world problems, such as the Speaker Diarization (SD) or automatic speech transcription, are influenced by the quality of the speaker-turns estimation. Previous works have already shown that auxiliary textual information (for mono-lingual systems) can be of great use for detection of speaker-turns and the diarization systems’ performance. In this paper, we suggest a framework for speaker-turn estimation, as well as the determination of clustered speaker identities to the SD system, and examine our approach over a multi-lingual dataset that consists of three mono-lingual datasets—in English, French, and Hebrew. As such, we propose a generic and language-independent framework for the SCD problem that is learned through textual information using state-of-the-art transformer-based techniques and speech-embedding modules. Comprehensive experimental evaluation shows that (i) our multi-lingual SCD framework is competitive enough when compared to a framework over mono-lingual datasets, and that (ii) textual information improves the solution's quality compared to the speech signal-based approach. In addition, we show that our multi-lingual SCD approach does not harm the performance of SD systems.

שפה מקוריתאנגלית
מספר המאמר119238
כתב עתExpert Systems with Applications
כרך213
מזהי עצם דיגיטלי (DOIs)
סטטוס פרסוםפורסם - 1 מרץ 2023

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'Speech and multilingual natural language framework for speaker change detection and diarization'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי