TY - BOOK AU - Sharma, Shreya AU - Mukerji, Mitali TI - ALU-map: A Natural Language Processing-Based Alu Feature Annotation on the Human Genome U1 - 572.8 PY - 2023/// CY - IIT Jodhpur PB - Department of Bioscience and Bioengineering KW - Department of Bioscience and Bioengineering KW - BERT and BioBERT models KW - MTech Theses N1 - Here's your revised text with corrected punctuation, DDC number, and five topical terms: The present study is aimed at developing a database entitled "ALU-map" that structures information related to various roles of Alu elements using state-of-the-art techniques, including Natural Language Processing (NLP) models such as BERT and BioBERT. We have explored the performance of these models by training them on literature abstracts retrieved from the PubMed database. Each abstract was assigned 10 different biological labels, assuming that a given abstract can hold information related to any of these labels, meaning a task of multilabel classification. The study also aims to develop a fine-tuned BERT model that would classify Alu abstracts into all the above-mentioned categories. While fine-tuning these models performs well, there are key limitations, which we also discuss. Finally, we constructed a database where all Alu abstracts are annotated into 10 different categories. If an abstract belongs to a category, then 1 is assigned; otherwise, 0. This database provides information on the involvement of Alu elements at different levels of biology, such as genetic, transcriptomic, proteomic, pathways, and as biomarkers, where the biological functions of Alu elements have been reported. We strongly believe that this database holds immense potential to serve researchers and scientists working in the field, providing them with invaluable resources and aiding their advancements ER -