تخطي إلى التنقل الرئيسي تخطي إلى البحث تخطي إلى المحتوى الرئيسي

Harnessing the power of Wav2Vec2 and CNNs for Robust Speaker Identification on the VoxCeleb and LibriSpeech Datasets

نتاج البحث: نشر في مجلةمقالةمراجعة النظراء

10 اقتباسات (Scopus)

ملخص

Speaker identification, a cornerstone of speech processing, involves associating individuals with spoken segments within a known speaker pool. This paper presents a significant AI contribution: an innovative framework tailored for closed-set speaker identification. It concurrently emphasizes its practical engineering application in the realm of speech analysis. This paper introduces a pioneering AI framework with substantial neural network architecture enhancements, particularly focusing on optimizing the Log-Softmax function—a linchpin for speaker attribution. Additionally, we seamlessly incorporate cutting-edge data augmentation techniques into the Wav2Vec2 framework. These innovations push the boundaries of current Speaker Identification methodologies. Empirical validation demonstrates our framework's efficacy, yielding a remarkable relative improvement of up to 3.16% in top-1% accuracy compared to the state-of-the-art. This research sets a new benchmark, surpassing existing standards and unlocking the full potential of closed-set Speaker Identification functions. In addition, the methodology presented in this paper serves as a catalyst for advancing Speaker Identification methodologies in engineering applications, underlining the transformative potential of AI-driven innovations in this domain.

اللغة الأصليةالإنجليزيّة
رقم المقال124671
دوريةExpert Systems with Applications
مستوى الصوت255
المعرِّفات الرقمية للأشياء
حالة النشرنُشِر - 1 ديسمبر 2024

بصمة

أدرس بدقة موضوعات البحث “Harnessing the power of Wav2Vec2 and CNNs for Robust Speaker Identification on the VoxCeleb and LibriSpeech Datasets'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا