Analysis Of SVM And CRF Based Tagger

Better Essays

When unknown word percentage (Shown in Table 2) is taken in to consideration, SVM based tagger has provided the highest accuracies when there are more unknown words (E2 and E4), whereas CRF has provided the highest accuracies when there is a less number of unknown words in testing data (E1, E3 and E5). This shows that SVM is more robust to unknown words.Moreover, when testing is done using a different domain (E4 and E5), SVM and CRF based taggers have provided the highest accuracies in E4 and E5, respectively. This confirms that SVM and CRF based taggers are more robust to domain adaptation. Therefore, based on the results of our experiments, a general conclusion cannot be made on a single tagger that performs well for Sinhala Language. …show more content…

Same observation is made for all other experiments as well. Therefore we cannot confidently make a conclusion on the best ensemble tagger setup. But any ensemble tagger outperforms any individual tagger. Next, experiment results confirm that there is a decrease in the accuracy when training and testing phases use different domain corpora. For example, in the individual SVM based tagger, the best accuracy of 88.24% is achieved when the training and testing is done using a combination of both Official Documents and News (E3). But when the tagger is trained with news and tested with official documents (E5), the accuracy is 82.01%, which is a decrement of 6.23%. However, we should consider the properties of training and testing corpora (percentage of unknown words, size of corpora) in these two experiments, before making a general conclusion. The percentages of unknown words are different in E3 and E5, where E5 has 10% unknown words in testing corpus while E3 only has 5%. To make a better conclusion, we can compare E5 with E2, which are again experiments of training and testing with the same domain corpus of news, and has 11% of unknown words in testing corpus. E2 has obtained a tagging accuracy of 88.14%, making a decrement of

Get Access

Analysis Of SVM And CRF Based Tagger

Elements Of Creating A Search Strategy

Elements Of Creating A Search Strategy

Pt1420 Unit 6 Assignment

Pt1420 Unit 6 Assignment

Isds Ch 5

Isds Ch 5

Model-Based Prediction Theory

Model-Based Prediction Theory

Negexpander Document Analysis

Negexpander Document Analysis

Word Retention Paper

Word Retention Paper

Autism Model Design

Autism Model Design

Grammar Checkers

Grammar Checkers

Advantages And Disadvantages Of Machine Translation

Advantages And Disadvantages Of Machine Translation

Experiment Into the Impact of Automatic Processing on Identifying Ink Colour on Colour Related and Colour Neutral Words

Experiment Into the Impact of Automatic Processing on Identifying Ink Colour on Colour Related and Colour Neutral Words

Analysis Of Google 's On Data Exhaust Essay

Analysis Of Google 's On Data Exhaust Essay

Discourse Analysis of Internet News

Discourse Analysis of Internet News

Advantages And Disadvantages Of Total Physical Response In English Language

Advantages And Disadvantages Of Total Physical Response In English Language

The Generation Strategy And The Definitions Of Strategies

The Generation Strategy And The Definitions Of Strategies

Research Methodology on Natural Language Processing

Research Methodology on Natural Language Processing

Related Topics