Multimedia concept detection is a challenging topic due to the well known class imbalance issue especially in a big data environment. With the rapid growth of multimedia data such as audio, image and video, as well as text data, applying powerful data mining approaches is a necessity to tackle issues of large and imbalanced datasets. For this purpose, in this paper, an Importance Factor based Multiple Correspondence Analysis (IF-MCA) framework is proposed with MapReduce implementation for dealing with large scale dataset. Specifically, a Hierarchical Information Gain Analysis method inspired by decision tree algorithm is combined with the Affinity Propagation (AP) algorithm for critical feature selection and Importance Factor (IF) assignment according to the ordering of selected features. Then the derived IF is incorporated with the Multiple Correspondence Analysis (MCA) algorithm for effective concept detection and retrieval. The experimental analysis with the application in video event detection demonstrates the effectiveness of the proposed framework compared to current data mining approaches. Furthermore, the scalability of the proposed method using MapReduce is evaluated by conducting experiments on training and classification time, which shows the efficiency of MapReduce on
MCA-based classifiers.
Currently, multimedia including image, video, and audio accounts for 60\% of internet traffic, 70\% of mobile phone traffic, and 70\% of all available unstructured
Data mining has popular technology for extracting interesting information for multimedia data sets, such as audio, video, images, graphics, speech, text and combination of several types of data set. Multimedia data are unstructured data or semi-structured data. These data are stored in multimedia database, multimedia mining which is used to find information from large multimedia database system, using multimedia techniques and powerful tools. The current approaches and techniques are explained for mining multimedia data. This paper analyzes about the use of essential characteristics of multimedia data mining, retrieving information is one of the goals of data mining and different issues have been discussed.
Thus our proposed optimal feature subset selection based on multi-level feature subset selection produced better results based on number of subset feature produced and classifier performance. The future scope of the work is to use these features to annotate the image regions, so that the image retrieval system can retrieve relevant images based on image semantics.
The rapid increase in the volume of digital libraries due to cell phones, web cameras and digital cameras etc, needs and expert system to have the effective retrieval of similar images for the given query image [1]. CBIR system is one of such experts systems that highly rely on appropriate extraction of features and similarity measures used for retrieval [10]. The area has gained wide range of attention from researchers to investigate various adopted methodologies, their drawbacks, research scope, etc [2-5, 14-18]. This domain became complex because of the diversification of the image contents and also made interesting. [10].
Music and motion pictures, two of the most popular forms of entertainment today, can easily be traced back hundreds of years ago. Yet a relatively newer form of entertainment (and information) has impacted those long-established industries in as little as a few years: the internet.
Abstract— Nowadays, with the increase in the dreadful diseases, huge amount of database is produced in hospitals and are exponentially increasing day by day. Utilizing these medical images after efficient classification plays a major role in case based reasoning and supports in clinical decision making. Therefore, it is important to classify these images and access them accurately. A modality classifier helps to classify the medical images based on the modality. In our study, we analyzed spatial and spectral features of Magnetic Resonance (MR) images and Computer Tomography (CT) scan and also the fusion of these features is performed. It is found that these modalities have different characteristics which help in classification. Initially, these images are preprocessing using the Median filter and spatial and spectral features were extracted and feature fusion is performed. The
Then, the data of the usage of different types of media resource indicates
The easily accessible Internet and the Worldwide Web revolutionized information sharing. For the first time data could be shared in real time as text, voice, graphics, and video among anyone with access to the Internet. (Gomez).
Cordelli et al. [11] consider a heterogeneous set of texture features belonging to different categories, statistical descriptors, spectral measures, local binary pattern (LBP) and morphological descriptors.
The Internet — a fascinating place jam packed with things to read, to watch and to download. According to the ISC Internet Domain Surveys, the number of registered website domains doubles every 2-3 years. There are over 625 million registered domains in their last survey conducted in January 2009. Greater than 1.7 billion people use the Internet and the majority of usage since the creation of peer-to-peer (P2P) file sharing protocol (from 2003-2006) has been to download audio and video files via these networks. However, since 2007, HTTP web traffic retakes the number one position in bandwidth usage.
Traditional media embrace all the means of communication that existed before the Internet and modern media technology, which include printed materials like (books, magazines, and newspapers) and broadcast communications (television and radio), film, and music. New media, on the other hand, includes electronic video games and entertainment, and the Internet and social media. “With the creation of the World Wide Web in the 1980s and the introduction of commercial browsers in the 1990s, users gained the ability to transmit pictures, sound, and video over the Internet.” (Internet Usage Statistics).
Especially the Internet is becoming more and more important for nearly everybody as it is one of the newest and most forward-looking media and surely "the" medium of the future.
Tekin et. al. (2013) provided a context information based method for improving the big data classification so that the conceptual information will be derived. Author applied work on distributed large and heterogenous datset. The data is collected from multiple streams so that the function driven classifier is applied to cover the complexities of individual stream. The local perspective method method is deifned under learning method to reduce the cost and to include the benefits associated to the learner and provide the contextual results. The data characterization is also provided by the work with improved mining model[5].
Correlation Coefficient approach evaluates how well an individual feature contributes to the separation of classes. Ranking criteria is used to rank all features using their mean and standard deviation for all the samples of both classes. The correlation coefficient is successfully reduced the number of features and also kept good classification accuracy (Zong-Xia et.al,2006).
In this dissertation a multimedia big data analysis framework for semantic information management and retrieval is presented. It contains three coherent components, namely multimedia semantic representation, multimedia concept classification and summarization, and multimedia temporal semantics analysis and ensemble learning. These three components are seamlessly integrated and act as a coherent entity to provide essential functionalities in the proposed information management and retrieval framework. More specifically:
Recently, we developed another efficient method for matching areas in the remote sensing images using our Contourlet-based key points with the development of a simple descriptor. A matching region was formed by the convex hull of the key points matching in both images. These regions could be used for matching, fusion, and registration of remote sensing images.