Score Fusion for Speaker Identification using MFCC and ICMC Features (Record no. 14688)

MARC details
000 -LEADER
fixed length control field 02860nam a22001697a 4500
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 621.36
Item number J837S
100 ## - MAIN ENTRY--AUTHOR NAME
Personal name Joshi, Sonal
245 ## - TITLE STATEMENT
Title Score Fusion for Speaker Identification using MFCC and ICMC Features
Statement of responsibility, etc by Sonal Joshi
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication IIT Jodhpur
Name of publisher Department of Electrical Engineering
Year of publication 2017
300 ## - PHYSICAL DESCRIPTION
Number of Pages xi,58p.
Other physical details HB
520 ## - SUMMARY, ETC.
Summary, etc "The task of Speaker Identification (SID) or Speaker Recognition is to recognise the person<br/>from a given speech utterance. That means to answer the question, ”Whose voice is this?” An important<br/>application of SID is in forensics to verify the identity of a suspect. Other than forensic<br/>applications, this technology is used to improve the performance of speech recognition, automatically<br/>adjust preferences as per personal needs like in home automation and identify the speaker in<br/>each segment of a teleconference or newsroom discussion (Speaker Diarization).<br/>Even though real world applications demand robustness against various possible practical<br/>and realistic conditions, generally SID systems have poor performance when there is a mismatch.<br/>Different recording conditions in training and testing data lead to mismatch, which can be in the<br/>form of language mismatch, session mismatch, sensor mismatch or any combination of the above.<br/>To improve speaker recognition performance in mismatch scenarios, score fusion of log-likelihood<br/>scores obtained using Gaussian Mixture Model - Universal Background Model (GMM-UBM) classifier<br/>is explored in this work.<br/>After an initial study of commonly used features using TIMIT database, GMM-UBMs using<br/>Mel Frequency Cepstral Coefficients (MFCC) and recently proposed Infinite impulse response<br/>Constant Q Mel-frequency cepstral Co-efficients (ICMC) features are scored independently. This<br/>work is motivated by the fact fusion of systems using MFCC and ICMC at the score level will lead to<br/>performance gain as both the features have complementary information. Experimental results, obtained<br/>using IITG Multivariability Speaker Recognition Phase-I and Phase-II Database, prove that<br/>the fusion results outperform the independently scored results by a significant margin for all mismatches.<br/>Reported average relative improvements in identification accuracy over baseline MFCC<br/>in percent for 128 mixture gaussian are 1.99% for language mismatch, 4.56% for session mismatch,<br/>5.38% for language and session mismatch, 204.54% for sensor mismatch, and 175.3% for sensor<br/>and session mismatch. Experimental results are also obtained using IITG Multivariability Speaker<br/>Recognition Phase-III which is a truly conversational data collected over phone call.<br/>i"<br/>
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical Term MFCC and ICMC Features
Topical Term MTech Theses
Topical Term Department of Electrical Engineering
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name Yadav, Sandeep Kumar
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type Thesis
Holdings
Withdrawn status Lost status Damaged status Not for loan Collection code Permanent Location Current Location Shelving location Date acquired Full call number Accession Number Price effective from Koha item type
      Not For Loan Reference S. R. Ranganathan Learning Hub S. R. Ranganathan Learning Hub Course Reserve 2024-01-18 621.36 J837S TM00111 2024-01-18 Thesis