ABSTRACT
Data mining has popular technology for extracting interesting information for multimedia data sets, such as audio, video, images, graphics, speech, text and combination of several types of data set. Multimedia data are unstructured data or semi-structured data. These data are stored in multimedia database, multimedia mining which is used to find information from large multimedia database system, using multimedia techniques and powerful tools. The current approaches and techniques are explained for mining multimedia data. This paper analyzes about the use of essential characteristics of multimedia data mining, retrieving information is one of the goals of data mining and different issues have been discussed.
Keywords: Data Mining, Multimedia Mining, Clustering, Classification.
1. INTRODUCTION
1.1 Multimedia data mining: Multimedia data mining has been shown in fig.1 is a subfield of data mining that used to find interesting information of implicit knowledge. Multimedia data are classified into five types, there are (i) text data (ii) image data (iii) audio data (iv) video data and (v) electronic and digital ink [2]. Text data can be used in web browsers, messages like MMS and SMS. Image data can be used in art work and pictures with text still images taken by a digital camera. Audio data contains sound, MP3 songs, speech and music. Video data include time aligned sequence of frames, MPEG videos from desktops, cell phones, video cameras. Electronic and digital ink
Data mining is a very important component in today’s big data [22, 23]. Data mining is essential for everyone from large businesses to government organizations. It helps to identify trends, patterns and make predictions by exploring, comparing, researching and analyzing data.
4) Technically speaking, data mining is a process that uses statistical, mathematical, and artificial intelligence techniques to extract and identify useful information and
In a world where computers are becoming as essential to daily life as the cars we drive or the telephones we use to communicate, it is difficult to find a person who doesn’t have some particular use for computers. Computers have become the information stores of the world. If you take a moment to think about all the kinds of information a person can and does hold on their computer it is staggering. I myself have all the passwords to my email and bank accounts, the history of every web page I’ve visited in the last 3 weeks, my credit card numbers, the complete history of all my banking transactions for the last three years stored on my computer. Additionally, think about all the
Today with the ever growing use of computers in the world, information is constantly moving from one place to another. What is this information, who is it about, and who is using it will be discussed in the following paper. The collecting, interpreting, and determination of use of this information has come to be known as data mining. This term known as data mining has been around only for a short time but the actual collection of data has been happening for centuries. The following paragraph will give a brief description of this history of data collection.
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.
Data mining: is a process of discovering patterns in large data involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems with the aim of extracting information and transforming it into an understandable structure for further use.
In its infancy, data mining was as limited as the hardware being used. Large amounts of data were difficult to analyze because the hardware simply could not handle it [1]. The term "data mining" first began appearing in the 1980 's largely within the research and computer science communities. In the 1990 's it was considered a subset of a process called Knowledge Discovery in Databases of KKD [1]. KKD analyzes data in the search for patterns that may not normally be recognized with the naked eye. Today however, data mining does not limit itself to databases,
Data mining is used in variety of fields and applications (Galit, Stiumueli, Natin & Peter 2010). This includes the military for purposes of intelligence,
As probably the most natural type of storing information is text and text mining is believed to have a commercial potential greater than that of data mining. The recent study indicated that 80% of a company’s information is including in text documents. Text mining, however, is also a more complex task as compare to data mining as it involves dealing with text data that are naturally unstructured and fuzzy. Text mining is a multidisciplinary area, involving information retrieval, text examination, information extraction, clustering, categorization, visualization, database technology, device learning, and data mining [2]. In Text Mining, patterns are extracted from natural language Textual Database. There are many methods of text mining. In general, the major approaches, based on the kinds of data they take as input,
This report focuses on the various advantages associated with the adoption and implementation of data warehousing and data mining technologies at Di Stefano cafe. Specifically, the adoption of data mining and data warehousing technologies at Di Stefano cafe implies that the managers need extra training on the effective use of the technologies; the cafe will utilize the consumer information to discover knowledge regarding consumption and spending patterns, the consumer data will be utilized in accordance with the stipulated guidelines to ensure data integrity, availability, confidentiality and privacy, and that the use of the technologies will give Di Stefano cafe a competitive advantage in terms of value co-creation and differentiated products. However, in order to effectively leverage on the benefits associated with the data mining and data warehousing technologies, Di Stefano cafe needs to train employees and managers on the new system, develop policies and implement security protocols that will ensure and promote
A data stream is a real time, continuous, structured sequence of data items. Mining data stream is the process of extracting knowledge from continuous, rapid data records. Data arrives faster, so it is a very difficult task to mine that data. Stream mining algorithms typically need to be designed so that the algorithm works with one pass of the data. Data streams are a computational challenge to data mining problems because of the additional algorithmic constraints created by the large volume of data. In addition, the problem of temporal locality leads to a number of unique mining challenges in the data stream case. The data mining techniques namely clustering, classification and frequent pattern mining are applied to extract the knowledge
Multimedia data mining is used for extracting interesting information for multimedia data sets, such as audio, video, images, graphics, speech, text and combination of several types of data set. Multimedia mining is a subfield of data mining which is used to find interesting information of implicit knowledge from multimedia databases. Multimedia data are classified into five types; they are (i) text data, (ii) Image data (iii) audio data (iv) video data and (v) electronic and
Many real life applications require the ability to decide whether a new set of observation is similar to the same distribution over a time series or not. It is considered for many application domains as a milestone and a watershed to their decision making process. Business and research sectors such as medical, financial, IT, cyber security and even crime investigation and terrorism are interested to invest in this field to have the ability for real time detection of unusual behavior.
Multimedia concept detection is a challenging topic due to the well known class imbalance issue especially in a big data environment. With the rapid growth of multimedia data such as audio, image and video, as well as text data, applying powerful data mining approaches is a necessity to tackle issues of large and imbalanced datasets. For this purpose, in this paper, an Importance Factor based Multiple Correspondence Analysis (IF-MCA) framework is proposed with MapReduce implementation for dealing with large scale dataset. Specifically, a Hierarchical Information Gain Analysis method inspired by decision tree algorithm is combined with the Affinity Propagation (AP) algorithm for critical feature selection and Importance Factor (IF) assignment according to the ordering of
Multimedia data mining is a subfield of data mining that using to find interesting information of implicit knowledge. Multimedia data are classified into five types, there are (i) text