Image from Google Jackets

Multi Task Impersonation Prevention Face Recognition Network (MTIPFRNet) and Layer Wise Triplet Loss by Sushil Kumar Surana

By: Contributor(s): Material type: TextTextPublication details: IIT Jodhpur Department of Computer Science and Technology 2023Description: x,45p. HBSubject(s): DDC classification:
  • 006.4 U721M
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Home library Collection Call number Status Date due Barcode Item holds
Thesis Thesis S. R. Ranganathan Learning Hub Reference Theses 006.4 U721M (Browse shelf(Opens below)) Not for loan TM00527
Total holds: 0

Recent work in the field of face recognition aims at achieving high accuracy, and they have been succeeding. Some of the face recognition algorithms have been able to achieve accuracy that surpasses even human beings for unseen data. Some recent literary works have focused on face identification and recognition even in cases of obstructions on the face, which could be intentional or unintentional, such as masks, makeup, etc.

In this work, we propose a novel multi-task framework for face recognition with the aim of increasing the capability of AI systems to match faces and reduce tolerance toward impersonation due to its critical applicability in areas requiring high secrecy, authenticity, and confidentiality. We propose to divide the face into several regions such as the mouth region, periocular region, nose region, etc., along with using the capability of existing systems that work on features identified from the complete face. The proposed framework takes four primary sub-tasks, namely Facial Feature matching, Periocular feature matching, Nose feature matching, and Mouth feature matching. Differently trained models of similar architecture for each of these features have been used. For the purpose of final authentication or verification, distance is calculated between each of the feature embeddings. Then these four distances are fed into an ensemble of classifiers to make the final decision on the match of the face. The proposed pipeline is able to attain an accuracy of 90.4%, which is approximately 1% higher than the best state-of-the-art accuracy for the under-consideration private dataset. We have also carried out an ablation study on this dataset, and the findings suggest a minimum of 0.8% points drop in case of the removal of any of the four feature embeddings. Moreover, the removal of any of these four tasks leads to an increase in the accuracy gap of matching and non-matching pairs, i.e., such removal induces bias.

This research work also proposes Layer-Wise Triplet Loss considering the class-independent nature of Triplet loss and contrastive losses. The newly proposed approach trains a neural network with loss for each layer. The loss for any layer impacts that layer and all layers preceding it. In this way, every layer is learning not only from the final output of the network but also from all internal layers as well. This learning with all internal layers and respective losses expedites the learning, and the model is trained quickly in very few epochs. In any deep neural network, lower layers contain better localization properties, and deeper layers turn to be more class-specific. In this layer-wise training, the same continues; however, since the proposed approach uses the loss functions which are independent of class representation, each layer is expected to generate features that are more representative for the final decision in verification or matching tasks. In order to avoid overfitting with the layer-wise loss function, we recommend a dropout of around 50% for each layer.

Random variations in brightness, color jitters, rotation, and noise induction will also help in avoiding overfitting and increasing the generalizability of the model. Increasing weights with the depth of layer-wise loss ensures that the lower layer of the model not only learns from its outputs but also learns from the outputs of higher layers. Experiments indicate that the newly proposed training approach, using the layer-wise loss function, is able to achieve accuracy comparable with the same state-of-the-art technology and is more generalizable in very few training iterations. The proposed approach saves not only training time but also training resources and hence is more environment-friendly.

There are no comments on this title.

to post a comment.