000 02860nam a22001697a 4500
082 _a621.36
_bJ837S
100 _aJoshi, Sonal
_926104
245 _aScore Fusion for Speaker Identification using MFCC and ICMC Features
_cby Sonal Joshi
260 _aIIT Jodhpur
_bDepartment of Electrical Engineering
_c2017
300 _axi,58p.
_bHB
520 _a"The task of Speaker Identification (SID) or Speaker Recognition is to recognise the person from a given speech utterance. That means to answer the question, ”Whose voice is this?” An important application of SID is in forensics to verify the identity of a suspect. Other than forensic applications, this technology is used to improve the performance of speech recognition, automatically adjust preferences as per personal needs like in home automation and identify the speaker in each segment of a teleconference or newsroom discussion (Speaker Diarization). Even though real world applications demand robustness against various possible practical and realistic conditions, generally SID systems have poor performance when there is a mismatch. Different recording conditions in training and testing data lead to mismatch, which can be in the form of language mismatch, session mismatch, sensor mismatch or any combination of the above. To improve speaker recognition performance in mismatch scenarios, score fusion of log-likelihood scores obtained using Gaussian Mixture Model - Universal Background Model (GMM-UBM) classifier is explored in this work. After an initial study of commonly used features using TIMIT database, GMM-UBMs using Mel Frequency Cepstral Coefficients (MFCC) and recently proposed Infinite impulse response Constant Q Mel-frequency cepstral Co-efficients (ICMC) features are scored independently. This work is motivated by the fact fusion of systems using MFCC and ICMC at the score level will lead to performance gain as both the features have complementary information. Experimental results, obtained using IITG Multivariability Speaker Recognition Phase-I and Phase-II Database, prove that the fusion results outperform the independently scored results by a significant margin for all mismatches. Reported average relative improvements in identification accuracy over baseline MFCC in percent for 128 mixture gaussian are 1.99% for language mismatch, 4.56% for session mismatch, 5.38% for language and session mismatch, 204.54% for sensor mismatch, and 175.3% for sensor and session mismatch. Experimental results are also obtained using IITG Multivariability Speaker Recognition Phase-III which is a truly conversational data collected over phone call. i"
650 _aMFCC and ICMC Features
_926105
650 _aMTech Theses
_926106
650 _aDepartment of Electrical Engineering
_926107
700 _aYadav, Sandeep Kumar
_926108
942 _cTH
999 _c14688
_d14688