Speech and multilingual natural language framework for speaker change detection and diarization

Or Haim Anidjar, Yannick Estève, Chen Hajaj, Amit Dvir, Itshak Lapidot

نتاج البحث: نشر في مجلةمقالةمراجعة النظراء

4 اقتباسات (Scopus)

ملخص

Speaker Change Detection (SCD) is the problem of splitting an audio-recording by its speaker-turns. Many real-world problems, such as the Speaker Diarization (SD) or automatic speech transcription, are influenced by the quality of the speaker-turns estimation. Previous works have already shown that auxiliary textual information (for mono-lingual systems) can be of great use for detection of speaker-turns and the diarization systems’ performance. In this paper, we suggest a framework for speaker-turn estimation, as well as the determination of clustered speaker identities to the SD system, and examine our approach over a multi-lingual dataset that consists of three mono-lingual datasets—in English, French, and Hebrew. As such, we propose a generic and language-independent framework for the SCD problem that is learned through textual information using state-of-the-art transformer-based techniques and speech-embedding modules. Comprehensive experimental evaluation shows that (i) our multi-lingual SCD framework is competitive enough when compared to a framework over mono-lingual datasets, and that (ii) textual information improves the solution's quality compared to the speech signal-based approach. In addition, we show that our multi-lingual SCD approach does not harm the performance of SD systems.

اللغة الأصليةالإنجليزيّة
رقم المقال119238
دوريةExpert Systems with Applications
مستوى الصوت213
المعرِّفات الرقمية للأشياء
حالة النشرنُشِر - 1 مارس 2023

بصمة

أدرس بدقة موضوعات البحث “Speech and multilingual natural language framework for speaker change detection and diarization'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا