A Brief Note On Image Based Qa On The Video And Audio Retrieval
1009 Words5 Pages
C. Image Based QA: An image-based QA approach was introduced in , which mainly focuses on finding information about physical objects. An image-based QA system allows direct use of an image that refers to the object. This type of systems was designed to find multimedia answers from web-scale media resources such as
Flicker, Google images.
D. Multimedia QA Search: Due to the increasing amount of digital information stored over the web, searching for desired information has become an essential task. The research in this area started from the early 1980s. With the rapid growth of content analysis technology in the 1990s, these efforts rapidly expanded to tackle the video and audio retrieval problems. Fig. 3 shows an example of MMQA.
Fig 3:…show more content… Li and
Roth  developed a machine learning approach that uses the SNoW learning architecture to classify questions into five coarse classes and 50 finer classes.
They used lexical and syntactic features such as partof- speech tags, chunks and head chunks together with two semantic features to represent the questions.
Zhang and Lee  used linear SVMs with all possible question word grams to perform question classification. Arguello et al.  investigated medium type selection as well as search sources for a query. It analyzes question, answer, and multimedia search performance. Then learn a linear SVM model for classification based on the results.
1.1 Question-Based Classification: Since many questions contain multiple sentences and some of the sentences are uninformative. The classification is accomplished with two steps. First, we categorize questions based on interrogatives second, for the rest questions; we perform a classification using a naive
Table I Representative interrogative words
1.2 Answer-Based Classification: Apart from questions, answer can also be an important clue. For answer classification bigram text features and verbs are extracted. With the help of verb it will be easy to judge whether the answer can be related with video content.Intuitively, if a textual answer contains many complex verbs, it is more likely to describe a dynamic process and thus it has high probability to be well answered by videos. Therefore, verb can