assignment1_sol

.sql

School

Indiana University, Bloomington *

*We aren’t endorsed by this school

Course

561

Subject

Computer Science

Date

Apr 3, 2024

Type

sql

Pages

5

Uploaded by ColonelWorld13096

/* Assignment 1 will examine your knowledge about SQL using a subset of the UMLS dataset Please note that you can only use what we covered so far by the time this assigment is released. For example, aggregation is not allowed (e.g., group by) Please answer every block marked with "## Please ..." Each mistake deducts 5 points from a total of 100 points If more than 100 points are deducted, you will get 0 still */ ------ PART I: Single-Table Queries ------ -- Q1: Check 100 tuples in the "mrrel" table -- Requirement: Please self-study https://www.w3schools.com/sql/sql_top.asp -- You can use "limit n" to check a few tuples in a table to get familiar with its data and schema -- This is helpful when you formulate queries below select * from mrrel limit 100; -- Q2: Find the CUIs of concepts that have synonyms (i.e., more than 1 AUI) -- Hint: there are 592,039 results like "C0000005", "C0000039", "C0000052" select distinct con1.cui from mrconso con1, mrconso con2 where con1.cui = con2.cui and con1.aui <> con2.aui; -- Q3: Find the CUIs of concepts that only have one 1 AUI, create it as a materialized view "mrconso_ oneAUI" -- Hint: there are 178,595 results like "C0000266", "C0000353", "C0000666" create materialized view mrconso_oneAUI as ( (select distinct cui from mrconso) except (select distinct con1.cui from mrconso con1, mrconso con2 where con1.cui = con2.cui and con1.aui <> con2.aui) ); -- Q4: With the help of view "mrconso_oneAUI" above, recover their complete tuples, order by CUI -- Requirement: Use NATRUAL JOIN !!! select * from mrconso_oneAUI natural join mrconso order by cui; -- Q5: With the help of view "mrconso_oneAUI" above, recover their complete tuples, order by CUI -- Requirement: Use SET PREDICATE !!! select * from mrconso where cui in (select * from mrconso_oneAUI) order by cui; -- Q6: Find cui, aui, tty, str of all concepts whose string field (str) contains the word "tooth" and is from source (sab) "MSH", order by CUI -- Requirement: Please self-study string matching in SQL from https://www.w3schools.com/sql/sql_like.asp -- Hint: there are 19 results select cui, aui, tty, str from mrconso where str like '%tooth%' and sab='MSH' order
by cui; -- Q7: Find all different possible values of (rel, rela) pairs for source (sab) "MSH" -- Requirement: Look up "REL Description" in https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/ abbreviations.html -- Hint: there are 22 results select distinct rel, rela from mrrel where sab = 'MSH'; -- Q8: Find all different possible values of (rel, rela, sab) triples where source (sab) is not "MSH" -- Hint: there are 244 results select distinct rel, rela, sab from mrrel where sab <> 'MSH'; -- Q9: Find the common (rel, rela) pairs shared by both sources "MSH" and "SNOMEDCT_US" -- Hint: use set operation; the result is empty (select distinct rel, rela from mrrel where sab = 'MSH') intersect (select distinct rel, rela from mrrel where sab = 'SNOMEDCT_US'); -- Q10: Find the common "rela" shared by both sources "MSH" and "SNOMEDCT_US" -- Hint: use set operation; there are 2 results (select distinct rela from mrrel where sab = 'MSH') intersect (select distinct rela from mrrel where sab = 'SNOMEDCT_US'); -- Q11: Find all the possible (tui, sty) values -- Hint: check table "mrsty"; there are 127 results select distinct tui, sty from mrsty; ------ PART II: Multi-Table Queries ------ -- Q12: Find (cui, aui, sab, str) of all those concepts under semantic type T018 ("Embryonic Structure") -- Hint: there are 3235 results select mrconso.cui, aui, sab, str from mrconso, mrsty where mrconso.cui = mrsty.cui and tui = 'T018'; -- Q13: Find all those relations of "Alzheimer's disease" and its synonyms -- Hint: in SQL, to include a single quote character within a string literal, you need to escape it by using two consecutive single quotes. -- Hint: there are 264 results ---- Step 1: Finds all CUIs with str = "Alzheimer's disease", and see what are the CUIs? select * from mrconso where str = 'Alzheimer''s disease'; ---- Step 2: Find all synonyms of the CUIs above, return (aui, str, sab) only select aui, str, sab from mrconso where cui = 'C0002395'; ---- Step 3: Find all relations in mrrel where an end-concept is Alzheimer's disease, name it as a view "rel_AD" select * from mrrel where cui1 = 'C0002395' or cui2 = 'C0002395';
-- Q14: The relations above only has IDs like cui and aui. Please create a new version of this table listing (aui1, str1, rela, aui2, str2, sab) ---- Step 1: create a view recording Python-dict-style mapping: aui_str[aui] = str drop materialized view if exists aui_str; create materialized view aui_str as select aui, str from mrconso; ---- Step 2: use it to convert the result table of Q14, return tuples with format (aui1, str1, rela, aui2, str2, sab) ---- Hint: self-join on mrconso is needed since we have (aui1, aui2) select aui1, r1.str as str1, rela, aui2, r2.str as str2, sab from mrrel, aui_str r1, aui_str r2 where r1.aui = aui1 and r2.aui = aui2 and (cui1 = 'C0002395' or cui2 = 'C0002395'); ---- Step 3: ---- UMLS is actually a knowledge graph better modeled as an RDF database rather than a relational database. ---- An RDF database stores S-P-O (Subject-Predicate-Object) relations as in our mrrel table, but is more graph-query friendly. ---- However, only MeSH is currently supported with SPARQL API: https://id.nlm.nih.gov/mesh/ ---- We are basically using SQL to solve graph queries, which takes quite some joins. ---- Now, check the results from Step 2, such as ---- "Alzheimer's disease" "isa" "A27169206" "Alzheimer's disease co-occurrent with delirium" "SNOMEDCT_US" ---- Answer the question: is aui1 the subject or the object? is aui2 the subject or the object? ---- Hint: getting your understanding of the relation edge direction correct is important for the later queries to complete. ---- ################################################################ ---- ## Your Answer: aui 1 is the object, and aui 2 is the subject ---- ################################################################ -- Q15. find the overlapped relations (aui, paui) between mrrel and mrhier, return tuples of the form (cui, aui, paui, str, pstr) -- Hint: paui is parent-aui based on "isa" relationship, it is important to have a correct understanding of edge direction in Step 3 of Q15 above -- Hint: str can be obtained from the materialized view "aui_str", pstr is parent- str to be created for result relation -- Hint: there are 602431 results with pa_intersect as ( (select cui, aui, paui from mrhier) intersect (select cui2, aui2, aui1 from mrrel) ) select pa_intersect.cui, pa_intersect.aui, r.str as str, paui, rp.str as pstr from pa_intersect, aui_str r, aui_str rp where r.aui = pa_intersect.aui and rp.aui = paui; -- Q16. [Recursive View] -- We consider only the hierarchy defined by the "isa" relationship -- Find all concepts BELOW "Alzheimer's disease" and one level above "Alzheimer's disease" (i.e., its parents), return tuples of the form (aui, paui, str, pstr)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help