SAS Assignment #1

.docx

School

Texas A&M University, Corpus Christi *

*We aren’t endorsed by this school

Course

5315

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

14

Uploaded by CaptainElementEel34

Report
Framingham Heart Study: Data Preparation Industry Aligned Activity Purpose This activity focuses on preparing data from the Framingham Heart Study for future statistical analyses as well as exploring data through descriptive statistics. SAS Software This activity can be performed using any SAS programming environment, including SAS Studio in SAS OnDemand for Academics . Industry Alignment This activity aligns with the healthcare industry. It uses data from a clinical study conducted to identify characteristics contributing to cardiovascular disease.
Framingham Heart Study: Data Preparation Industry Applied Activity Table of Contents Framingham Heart Study: Data Preparation 1 Purpose 1 SAS Software 1 Industry Alignment 1 Activity Notes and Requirements 3 Learning Objectives 3 Estimated Completion Time 3 Experience Level 3 Prerequisite Knowledge 3 Software 3 Content Knowledge 3 Additional Notes 3 Data Source 3 Introduction 3 Description of Variables 4 Framingham Heart Study: Data Preparation Activity 5 Part 1: Understanding the Variables 5 Part 2: Creating New Variables and Subsetting the Data 8 Appendix 12 Appendix A: Access Software 12 Appendix B: Helpful Documentation 12 Appendix C: Recommended Learning 12 2
Activity Notes and Requirements Learning Objectives This activity provides practice with skills such as: Implementing data changes and manipulations Preparing data for future possible statistical analyses Exploring data through descriptive statistics including: o Understanding variables and their values within the data o Recognizing the need for changes in the data Estimated Completion Time This activity will take students approximately 3 hours to complete. Experience Level To complete this activity students should have the following levels of experience: Intermediate skill in SAS programming Beginner skill in statistics Prerequisite Knowledge Software Students should have experience with the following: Foundations of programming with the SAS Data Step including using functions and if/then/else conditional statements. SAS descriptive procedures such as PROC PRINT, PROC CONTENTS, PROC FREQ, PROC MEANS, and PROC UNIVARIATE. Content Knowledge Students should have experience/knowledge with the following concepts: Descriptive statistics such as mean, median, counts, and percentages Conditional if/then/else logic Additional Notes This activity pairs well with the following activities that you will complete:: Framingham Heart Study: Descriptive Analysis, Industry Applied Activity Framingham Heart Study: Statistical Analysis, Industry Applied Activity Data Source Introduction This activity uses the HEART dataset in the SASHELP library. To access the SASHELP library in SAS, select   View Explorer . In the Explorer window, select   Libraries Sashelp . The data came from the landmark Framingham Heart Study ( https://framinghamheartstudy.org/ ). The purpose of the Framingham Heart Study was to identify characteristics contributing to
Framingham Heart Study: Data Preparation Industry Applied Activity cardiovascular disease. Important links between cardiovascular disease and high blood pressure, high cholesterol levels, cigarette smoking, and many other health factors were first established using its data. The original cohort of the Framingham Heart study consisted of 5,209 men and women between the ages of 28 and 62 living in Framingham, Massachusetts. The first visit of data collection for participants in this cohort occurred between 1948 and 1953, and participants were assessed every two years thereafter through April 2014—almost 7 decades! The complete Framingham Heart Study data consists of hundreds of datasets taken over time at 32 biennial exams and has led to over 3000 (wow!) published journal articles. To simplify analyses for illustrative purposes, the SASHELP.HEART dataset includes a snapshot of selected primary study variables taken at one of the biennial exams. Description of Variables The variables used for this exercise are: Variable Description Status Alive or dead DeathCause Cause of death AgeCHDdiag Age at which CHD was diagnosed Sex Male or female AgeAtStart Age at the entry into the Framingham Heart Study Height Height in inches Weight Weight in pounds Diastolic Diastolic blood pressure Systolic Systolic blood pressure MRW Metropolitan Relative Weight Smoking Number of packs of cigarettes smoked per week AgeatDeath Age at death Cholesterol Total cholesterol Chol_Status Total cholesterol categorized into groups BP_Status Diastolic and systolic blood pressure categorized into groups Weight_Status Height and weight categorized into groups Smoking_Statu s Number of packs of cigarettes smoked per week categorized into groups 4
Framingham Heart Study: Data Preparation Industry Applied Activity Framingham Heart Study: Data Preparation Activity This activity is comprised of two parts. Part one outlines how to explore the data to understand the variables for analysis. Part two outlines how to prepare the data for future analyses by creating new variables and subsetting the data. Part 1: Understanding the Variables Deciding an appropriate path for analysis often requires many steps. An important first step is exploring and examining the data. An initial exploratory data analysis provides understanding of the meaning of study variables and can provide crucial clues into data preparations needed before analyzing the data. 1. Open and examine the SASHELP.HEART dataset and its variables. Familiarize yourself with the context and meanings behind the variables and their values. a. How many observations are in the dataset? There is a total of 5209 observations in the SASHELP.HEART dataset. b. How many variables are in the dataset? How many are numeric? How many are character? There are 17 total variables in the dataset. 10 of the variables are numeric, the remaining 7 are character. Exploring the assigned values of character variables can demonstrate patterns and inherent orderings. The default ordering of levels in SAS is alphabetical order. The levels of many character variables have an inherent ordering of magnitude. For example, non-smokers smoke less than light smokers who smoke less than moderate smokers. 2. Tabulate the levels of the character variables in the SASHELP.HEART dataset. For each of the character variables: a. What data values or levels are observed for each? The Status variable has two values: Alive (3218) and Dead (1991). The DeathCause variable includes five values: cancer (539), cerebral vascular disease (378), coronary heart disease (605), other (357), and unknown (112); with a blank data vluae indicating individuals currently alive. The Sex variable has two values: Female (2873) and Male (2336). Chol_Status has three values: Borderline (1861), Desirable (1405), and High (1791). Similarly, BP_Status has three data values: High (2267), Normal (2143), and Optimal (799). 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help