Score Fusion for Speaker Identification using MFCC and ICMC Features by Sonal Joshi

By:

Joshi, Sonal

Contributor(s):

Yadav, Sandeep Kumar

Material type: Text

TextPublication details: IIT Jodhpur Department of Electrical Engineering 2017Description: xi,58p. HBSubject(s):

DDC classification:

621.36 J837S

Summary: "The task of Speaker Identification (SID) or Speaker Recognition is to recognise the person from a given speech utterance. That means to answer the question, ”Whose voice is this?” An important application of SID is in forensics to verify the identity of a suspect. Other than forensic applications, this technology is used to improve the performance of speech recognition, automatically adjust preferences as per personal needs like in home automation and identify the speaker in each segment of a teleconference or newsroom discussion (Speaker Diarization). Even though real world applications demand robustness against various possible practical and realistic conditions, generally SID systems have poor performance when there is a mismatch. Different recording conditions in training and testing data lead to mismatch, which can be in the form of language mismatch, session mismatch, sensor mismatch or any combination of the above. To improve speaker recognition performance in mismatch scenarios, score fusion of log-likelihood scores obtained using Gaussian Mixture Model - Universal Background Model (GMM-UBM) classifier is explored in this work. After an initial study of commonly used features using TIMIT database, GMM-UBMs using Mel Frequency Cepstral Coefficients (MFCC) and recently proposed Infinite impulse response Constant Q Mel-frequency cepstral Co-efficients (ICMC) features are scored independently. This work is motivated by the fact fusion of systems using MFCC and ICMC at the score level will lead to performance gain as both the features have complementary information. Experimental results, obtained using IITG Multivariability Speaker Recognition Phase-I and Phase-II Database, prove that the fusion results outperform the independently scored results by a significant margin for all mismatches. Reported average relative improvements in identification accuracy over baseline MFCC in percent for 128 mixture gaussian are 1.99% for language mismatch, 4.56% for session mismatch, 5.38% for language and session mismatch, 204.54% for sensor mismatch, and 175.3% for sensor and session mismatch. Experimental results are also obtained using IITG Multivariability Speaker Recognition Phase-III which is a truly conversational data collected over phone call. i"

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Home library	Collection	Call number	Status	Date due	Barcode	Item holds
Thesis	S. R. Ranganathan Learning Hub Course Reserve	Reference	621.36 J837S (Browse shelf(Opens below))	Not For Loan		TM00111

Total holds: 0

Browsing S. R. Ranganathan Learning Hub shelves, Shelving location: Course Reserve, Collection: Reference Close shelf browser (Hides shelf browser)

Previous	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	Next
Previous	621.315 K963M Manganese Dioxide Nanostructure : Synthesis and Charaterization for Supercapacitor Applications	621.315 R148D Dynamic Performance Improvement of Bridgeless PFC Boost Rectifier Using ISM Control	621.317 3 B575D Development of a Sensorless Buck Converter for Electro-Chemical Applications	621.36 J837S Score Fusion for Speaker Identification using MFCC and ICMC Features	621.367 C435I Image Segmentation based on Random Process	621.367 J254I InternetHDR: Creating an HDR image Using a single LDR Image	621.367 S184E Efficient Image Retargeting for high Dynamic Range Scenes	Next

"The task of Speaker Identification (SID) or Speaker Recognition is to recognise the person
from a given speech utterance. That means to answer the question, ”Whose voice is this?” An important
application of SID is in forensics to verify the identity of a suspect. Other than forensic
applications, this technology is used to improve the performance of speech recognition, automatically
adjust preferences as per personal needs like in home automation and identify the speaker in
each segment of a teleconference or newsroom discussion (Speaker Diarization).
Even though real world applications demand robustness against various possible practical
and realistic conditions, generally SID systems have poor performance when there is a mismatch.
Different recording conditions in training and testing data lead to mismatch, which can be in the
form of language mismatch, session mismatch, sensor mismatch or any combination of the above.
To improve speaker recognition performance in mismatch scenarios, score fusion of log-likelihood
scores obtained using Gaussian Mixture Model - Universal Background Model (GMM-UBM) classifier
is explored in this work.
After an initial study of commonly used features using TIMIT database, GMM-UBMs using
Mel Frequency Cepstral Coefficients (MFCC) and recently proposed Infinite impulse response
Constant Q Mel-frequency cepstral Co-efficients (ICMC) features are scored independently. This
work is motivated by the fact fusion of systems using MFCC and ICMC at the score level will lead to
performance gain as both the features have complementary information. Experimental results, obtained
using IITG Multivariability Speaker Recognition Phase-I and Phase-II Database, prove that
the fusion results outperform the independently scored results by a significant margin for all mismatches.
Reported average relative improvements in identification accuracy over baseline MFCC
in percent for 128 mixture gaussian are 1.99% for language mismatch, 4.56% for session mismatch,
5.38% for language and session mismatch, 204.54% for sensor mismatch, and 175.3% for sensor
and session mismatch. Experimental results are also obtained using IITG Multivariability Speaker
Recognition Phase-III which is a truly conversational data collected over phone call.
i"

There are no comments on this title.

to post a comment.

Print
Send to device
Save record
BIBTEX Dublin Core MARC (non-Unicode/MARC-8) MARCXML RIS
More searches

Search for this title in:
Other Libraries (WorldCat) Other Databases (Google Scholar) Online Stores (Bookfinder.com) Open Library (openlibrary.org)