ASS-1

.docx

School

Illinois Institute Of Technology *

*We aren’t endorsed by this school

Course

429

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

11

Uploaded by BarristerBraveryFlamingo36

Information Retrieval Assignment-1 A20545920 Satya Jaidev forecast Exercise 1.1 Draw the inverted index that would be built for the following document collection. Doc 1 new home sales top forecasts Doc 2 home sales rise in july Doc 3 increase in home sales in july Doc 4 july new home sales rise home 1 2 3 4 forecast s 1 in 2 increas e 3 july 2 3 new 1 rise 1 sales 1 2 3 4 top 1 Exercise 1.2 Consider these documents: Doc 1 breakthrough drug for schizophrenia Doc 2 new schizophrenia drug Doc 3 new approach for treatment of schizophrenia
Information Retrieval Assignment-1 A20545920 Satya Jaidev Doc 4 new hopes for schizophrenia patients a. Draw the term-document incidence matrix for this document collection. Term/ Document Doc1 Doc2 Doc3 Doc4 approach 0 0 1 0 breakthrough 1 0 0 0 drug 1 1 0 0 for 1 0 1 1 hopes 0 0 0 1 new 0 1 1 1 of 0 0 1 0 patients 0 0 0 1 schizophrenia 1 1 1 1 treatment 0 0 1 0 b. Draw the inverted index representation for this collection.
Information Retrieval Assignment-1 A20545920 Satya Jaidev Exercise 1.3 For the document collection shown in Exercise 1.2, what are the returned results for these queries: a. schizophrenia AND drug Doc1, Doc 2 b. for AND NOT(drug OR approach) Doc 4 Exercise 1.7 Recommend a query processing order for d. (tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes) given the following postings list sizes: Term Postings size eyes 213312 kaleidoscope 87009 marmalade 107913 skies 271658 tangerine 46653 trees 316812 N(tangerine) + N(trees) = 363465 N(marmalade) + N(skies) = 379571 N(kaleidoscope) + N(eyes) = 300321 Order is (kaleidoscope OR eyes) AND (tangerine OR trees) AND (marmalade OR skies) Exercise 1.8 If the query is: friends AND romans AND (NOT countrymen)
Information Retrieval Assignment-1 A20545920 Satya Jaidev how could we use the frequency of countrymen in evaluating the best query evaluation order? In particular, propose a way of handling negation in determining the order of query processing. For each of the n terms, get its postings, Process in the order of increasing frequency, start with smallest set and then keep cutting further.If countrymen is more frequent then it can be used to remove documents by where it does not exist. Count for word X in (documents where word X occurs) For Word X the count for !X in ((number of total documents)-(documents where word X occurs)). Exercise 2.1 Are the following statements true or false? a. In a Boolean retrieval system, stemming never lowers precision. False b. In a Boolean retrieval system, stemming never lowers recall. True c. Stemming increases the size of the vocabulary. False d. Stemming should be invoked at indexing time but not while processing a query. False Exercise 2.6 We have a two-word query. For one term the postings list consists of the following 16 entries: [4,6,10,12,14,16,18,20,22,32,47,81,120,122,157,180] and for the other it is the one entry postings list: [47]. Work out how many
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help