Multimedia Concept Detection For A Big Data Environment

5682 Words Jul 5th, 2015 23 Pages
Multimedia concept detection is a challenging topic due to the well known class imbalance issue especially in a big data environment. With the rapid growth of multimedia data such as audio, image and video, as well as text data, applying powerful data mining approaches is a necessity to tackle issues of large and imbalanced datasets. For this purpose, in this paper, an Importance Factor based Multiple Correspondence Analysis (IF-MCA) framework is proposed with MapReduce implementation for dealing with large scale dataset. Specifically, a Hierarchical Information Gain Analysis method inspired by decision tree algorithm is combined with the Affinity Propagation (AP) algorithm for critical feature selection and Importance Factor (IF) assignment according to the ordering of selected features. Then the derived IF is incorporated with the Multiple Correspondence Analysis (MCA) algorithm for effective concept detection and retrieval. The experimental analysis with the application in video event detection demonstrates the effectiveness of the proposed framework compared to current data mining approaches. Furthermore, the scalability of the proposed method using MapReduce is evaluated by conducting experiments on training and classification time, which shows the efficiency of MapReduce on
MCA-based classifiers.

Currently, multimedia including image, video, and audio accounts for 60\% of internet traffic, 70\% of mobile phone traffic, and 70\% of all available unstructured…
Open Document