preview

What Is The Process Of Applying Machine Learning For A Problem

Decent Essays

The process of applying machine learning for a problem is usually a two phase process, the training phase which involves learning meaningful models using the training data and the testing phase where the learned models are evaluated on an unseen dataset to estimate how well they perform. Since we are interested in classification problems in this work, this would involve training a classifier and then obtaining accuracy of classifier on test data. Labeled data is required in both phases. Labeling data is a tedious and expensive procedure, often requiring manual processing. Hence, it is desirable to reduce the amount of labeling effort as much as possible. There have been concrete efforts to reduce the dependence on labeled data for training …show more content…

How the classifier was trained is immaterial in this problem. It is also worth noting that this problem is completely different from cross validation or any such method employed to measure the goodness of classifier during training phase. Once again the classifier training process is of no relevance for us, our goal is to accurately estimate the accuracy of a given trained classifier on a test set with as little labeling effort as possible. A trained classifier is almost always applied on a dataset which was never available during training and estimating performance on this dataset requires it to labeled. Moreover, a classifier may be deployed into some real world application where test data can be extremely large and hence not possible to label all of them. Considering the importance of this problem, very little efforts have been made to address the constraints posed by labeling costs during classifier evaluation phase. Very few works have looked into it. Some attempts have been made towards unsupervised evaluation of multiple classifiers [12, 16, 17, 7]. Although, unsupervised evaluation sounds very appealing, these methods are feasible only if multiple classifiers are present. In contrast, our focus is on the more general and practical case where the goal is to estimate the accuracy of a single classifier without the aid of any other classifier. Since, the labeling resources are limited, the problem now boils down to sampling instances for labeling

Get Access