System Architecture Of Event And Temporal Information Extraction

This chapter is dedicated to system architecture of event and temporal information extraction. In this chapter the model of the system is presented in detail. The first section of this chapter discusses our data source. The system is consist of four components the first component responsible for data preprocessing, the second for tagging, which contain different syntactic and semantic tagging tools, Stanford part of speech tagger, Stanford parser, HeidelTime temporal tagger, Stanford named entity recognizer. Third component is the extractor and finally the template generator. The components are discussed in detail afterward. The architecture is depicted in Fig 6. Figure 4.1: System Architecture 4.2. Data source To evaluate and train the prototype system developed, data from different sources like TimeBank1.2, AQUAINT TimeML Corpus, TempEval 2 and TempEval 3 are used, the data from all these sources are TimeML annotated. The TimeBank Corpus [25] contains 183 news articles that have been annotated with temporal information, adding events, times and temporal links between events and times. The annotation follows the TimeML 1.2.1 specification. The TimeBank sources come from a variety of news reports. Specifically, articles come from the Automatic Content Extraction (ACE) program and PropBank (TreeBank2) texts. Those coming from ACE come from transcribed broadcast news from the following sources: ABC, CNN, PRI, and VOA, and newswire from AP and NYT. PropBank supplied
