Morphological Analysis for Hindi Language. by Prity Goyal
Material type: TextPublication details: IIT Jodhpur Department of Computer Science & Engineering 2020Description: ix,38p. HBSubject(s): DDC classification:- 006.304 914Â G724M
Item type | Home library | Collection | Call number | Status | Date due | Barcode | Item holds | |
---|---|---|---|---|---|---|---|---|
Thesis | S. R. Ranganathan Learning Hub Course Reserve | Reference | 006.304 914 G724M (Browse shelf(Opens below)) | Not For Loan | TM00202 |
"Morphological analysis is the process of providing grammatical information about the word on the basis of properties of the morpheme it contains. It plays a vital role in Natural Language Processing (NLP) and ease the job of machine translation.In this work, we have developed a morphological analyzer for the Hindi language.The analyzer takes Hindi words as input and divides it into its prefixes,suffixes, and roots/bases. The analyzer also provides the details of its grammatical feature/categories like number (singular/plural) and gender masculine/feminine).It is rule-based analyzer and works well with both the inflectional and derivationalmorphemes.Stemming is the process of trimming of suffix and prefix from the input word to get the corresponding root word. Several times, merely trimming the affixes do not always yield in a correct stemmed word. Lemmatizers are commonly used to overcome this challenge. A typical lemmatizer extract the lemma from the given word and adds special rules to make the trimmed word a correct stem. In this work,we have designed an inflectional lemmatizer that creates rules for extracting the suffixes, prefixes, and additional rules for making a correct root word.We also present an approach to identify the gender from the first name of a person.The gender classification is done by identifying similarities from masculine or feminine name. We created a data-set containing masculine and feminine names.Decision tree is used to categorized names into masculine and feminine classes. The same approach has been used for number classification.The building of a derivational analyzer requires information about the derivational variants. To extract the essential features of a derived word, the derivational variants in the language should be known, and then they must be analyzed. Therefore,we have trained a model for classifying Hindi derivational variants using supervised approach. To this end, total 11 derivational suffixes have been used in verb-to-noun derivations. We have used an (support vector machine) SVM classifier with seven features and 400-word pair training data for our classifier. This classifier is used to find out whether the derivational relationship exist or not between word pair.
"
There are no comments on this title.