Assignment 4 Linear Regression v3

docx

School

Howard University *

*We aren’t endorsed by this school

Course

MISC

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

13

Uploaded by Dillah1

Report
[Student Name] DATA 320 Module 4 Linear Regression Model Development using Power BI Name Overview For this assignment, you will Conduct Exploratory Data Analysis (EDA) to prepare data for further analysis. Analyze data for relationships and/or trends using PowerBI. Create three visualizations to explain the data. Use the template to answer questions about the data anomalies and your findings. Create a linear regression model to perform a prediction. Write a letter to future students in your university providing them with data-driven advice. Scenario You are a university student, and you are trying to understand the best ways to succeed. The professor has been offering study sessions to help the students and has asked each student to keep track of the number of minutes they spent on the last assignment. Your data analytics professor has written a survey with the following questions: 1) How many years have you been in school? 2) Are you a full-time or part-time student? 3) What is the name of your success coach? 4) What degree are you pursuing? 5) Did you attend the last study session? 6) How many minutes did you devote to the last assignment? 7) What grade did you get on the assignment (out of 100) The system automatically generated an ID number. The professor provides you with the data (attached) and asks that you do the following: 1) List the questions and goals for the project 2) Explore the raw data. What are the fields, and how many records? Find any anomalies and fix the data. 3) Perform a trend analysis 4) Create at least 3 visualizations 5) Create a linear regression 6) Understand the implications of the model you created 7) Write a letter to future students in the class giving them advice on how to succeed using the output from the linear regression. Data 320 Assignment 4 Linear Regression 1
[Student Name] Exploratory Data Analysis Phase Initial Questions Answers How would you state the problem you are trying to solve? What are the project goals? What questions are you trying to answer? Are there more questions that you can think of other than what is in the scenario? Who is your audience? Are there additional stakeholders/decision-makers? Explore the raw data, anomaly detection, and transformation Answers/Results Load the Excel file into Power BI Desktop *if you are using the Virtual Lab (VDA) to access Power BI desktop, follow the instructions- Loading files and Publishing Power BI in the UMGC Virtual Lab found in the classroom. Click on the “Student Survey” worksheet and click Transform Data. You should now be in Power Query Editor Click on View and ensure that “column quality”, “column distribution”, and “column profile” are checked. Go to the menu in the bottom left corner, how many columns do you have? Data 320 Assignment 4 Linear Regression 2
[Student Name] How many distinct rows of data do you have? Do any of your fields have missing or empty data? Which one (s)? Look closely at each field, is there any unusual data that might need to be removed? You checked with your professor and told them about the data anomalies that you discovered, they told you to ignore the missing records. However, they suggest deleting the row with strange data. You need to return to Excel, delete the row that has the strange data. Then open that new file in Power BI again. Now, how many rows of data do you have? For each field, what kind of data do you see (categorical or continuous)? What do you think each field means? Refer to the scenario for help. <<Enter your columns in the tables that follow these instructions. Add or remove any rows to the tables as needed>> Is there a field that doesn’t look like categorical OR continuous (numerical)? Examination of Categorical Fields (Click on the field and use the column statistics to collect this data). Name of field Brief description Number of categories Examination of Continuous (Numerical) Fields (Click on the field and use the column statistics to collect this data). Round to two decimals. Name of field Brief description Minimum value Maximum value Average Standard deviation Data 320 Assignment 4 Linear Regression 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[Student Name] Trend Analysis Step Answers/Results Before you get into the data, using only your common sense and experience, what factors do you think would lead to a higher grade on the assignment? After looking at the columns, which field do you think is the target variable that we want to study? Click the “Close and Apply” button in Data Transformation. We will return to this after you have spent some time analyzing the data. You should now be in the Power BI Desktop main window. Click on the Key Influencer icon in the Visualizations Move “Grade” into the Analyze box and move “Total Minutes Spent” into the Explain By box. Do the same step above for other fields to see what happens with “Grade” Think back to your top reasons why you think students would get good grades- did the data SUPPORT or CONTRADICT your thoughts? Which ones? Looking at the influencers the data found, what are some of the factors influencing the student’s grade? In plain language, how would you explain the key influencers you found? Use the Key Influencer to analyze “Grade” only using the “Total Minutes spent on the Assignment” as the explain by variable. What values did you get? Data 320 Assignment 4 Linear Regression 4
[Student Name] Visualization Using your knowledge about data visualizations, create at least three visualizations that help to show or explain something significant about the data. Create a new page for each visualization. Regression Analysis Step Answers/ Results Create a new page. Create a scatterplot visualization with Total Minutes Spent on Assignment in the X- Axis and Grade in the Y-Axis. Change the aggregation defaults from SUM to Don’t Summarize for both. Look at the scatterplot created- what is a general statement you can say about this graph? From the Analytics pane- add a Trend Line Data 320 Assignment 4 Linear Regression 5
[Student Name] What happened to your graph? Click on New Measure Copy and paste the following DAX code to create a Correlation Coefficient coeff corr = //x ̄ var __muX = calculate ( AVERAGE ( 'Student survey'[Total minutes spent on assignment] )) //ȳ var __muY = calculate ( AVERAGE ( 'Student survey'[Grade] )) //numerator var __numerator = sumx ( 'Student survey' ,( 'Student survey'[Total minutes spent on assignment] - __muX )*( 'Student survey'[Grade] - __muY )) //denominator var __denominator = SQRT ( sumx ( 'Student survey' ,( 'Student survey'[Total minutes spent on assignment] - __muX )^ 2 )* sumx ( 'Student survey' ,( 'Student survey'[Grade] - __muY )^ 2 )) return divide ( __numerator , __denominator ) Click the Check box. Then click the X to close the window. You will now see the quick measure on the right side under fields. Data 320 Assignment 4 Linear Regression 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[Student Name] Adjust your scatterplot to add a little room to the right of the graph. Click the “Card” visualization next to your scatterplot. Click and drag “coeff corr” you just made onto the card. You should now see the Correlation Coefficient. What is the number it returned? What does a correlation coefficient mean? Review the Week 7 Readings and Presentations to understand what the correlation coefficient is. Interpreting a Linear Regression Model The linear regression model is one of the oldest statistical methods used for predicting continuous variables. As you have done, we plotted two variables, x=time spent on assignment and y=grade, on a scatterplot and a trend line was calculated to find a line that was closest to all the points plotted on the graph. The equation is written as y = a + bx . Y is your output that you want to know, X is the input, a is where the line crosses the and b is the slope of the line. The slope helps us to know the relationship between the two variables, in other words, what happens to the grade when the time spent on the assignment decreases or increases? Data 320 Assignment 4 Linear Regression 7
[Student Name] This is where things get a little tricky! Earlier in this exercise, you examined the key influencers on grades. If you only analyze “Grade” explained by “Total minutes spent on assignment”, then you likely discovered that for every 74.76 minutes spent on the assignment, the grade increases by 10.97 points. This seems reasonable, fill in the expected output (grade) below for each input: x=time spent on assignment (minutes) y=grade (10.97 for every 74.76) 0 0 74.76 149.52 224.28 Does this table seem reasonable? No, it is very low because we don’t know where it starts from. The key influencer is helpful to tell us that for every 74.76 minutes a student spends, their grade will increase by 10, but this is assuming it starts at some number higher than 0! Examine the trend line on the scatterplot that you made. If you find around 74.76 on the X-axis and move up on the graph where does the line hit? Maybe somewhere around 61? Unfortunately, this is very difficult to know exactly because Power BI does NOT provide the exact linear regression equation, they used to draw the trend line. Your data analytics professor wants you to practice using a linear regression and provides you with the equation you need (they got it from plotting the data in Excel and adding a trend line). The equation that you need is: y=48.325 + 0.1648x with an R 2 value of 0.8972 Now, let’s go back to our simple table and try this exercise again (we’ve done the first line for you)- you need to fill in the Y (grade) predicted based on the X (minutes spent on the assignment): x=time spent on assignment y=grade Data 320 Assignment 4 Linear Regression 8
[Student Name] (minutes) y=48.325 + 0.1648x 0 48.325 30 60 74.76 90 100 120 149.52 180 200 224.28 240 280 300 310 320 Model interpretation Steps Results/Answers Now, go back again to your trendline and look again at the values, NOW does this make sense? What are the differences between the grades for 74.76, 149.52, and 224.28? What is the number you found? In other words, for every 74.76 minutes spent, what happens? Is this the same or different than the Key Influencer value? What is the difference in grades between 120 minutes and 60 minutes? In other words, for every 60 minutes spent, what happens? What is the difference in grades between 240 minutes and 280 minutes? In other words, for every 40 minutes spent, what happens? What is the difference in grades between 200 minutes and 180 minutes? In other words, for every 20 minutes spent, what happens? What is the difference in grades between 310 minutes and 300 minutes? In other words, for every 10 minutes spent, what happens? What happened when you entered 320 minutes into the equation? What about the R 2 value of 0.8972 the professor gave you from the equation? We can interpret this as telling us that 89.72% of the data fit the regression model, which is pretty good! Usually, the higher an R 2 value, the better. Do you feel confident about getting 89.72% of the data to fit your model? Armed with this information, what advice would you give a future student taking the course? Is this different than what you originally found in your trend analysis when you used the key influencer? How can we have two different predictive values? Data 320 Assignment 4 Linear Regression 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[Student Name] What is happening is that Power BI uses a proprietary multiple regression calculation in the key influencer calculations which means that they are looking at different combinations of variables and how they affect the grade. However, when we plotted the scatterplot, we explicitly only chose two variables for a simple linear regression. So, technically, both can be correct, they just are using different methods to perform the calculations. In plain language, how would you explain your linear regression model to a future student? Publishing to Power BI Service Make one more page in your Power BI desktop that has the three visualizations you created and the scatterplot with the trendline. Arrange them so that they are easy to see on one page. You will be inserting this as a graphic into your letter. In order to use some of the other Power BI tools that are only available in the Power BI Service, you need to Publish to Power BI. Once you are satisfied with the three visualizations and the trendline you created in the steps above, click on the “Publish” icon in the toolbar. First, you will be prompted to save your Power BI Desktop workbook, do this to a location and make note of that location. *if you are using the Virtual Lab (VDA) to access Power BI desktop, follow the instructions- Loading files and Publishing Power BI in the UMGC Virtual Lab found in the classroom. Next, click on “My workspace” and click “Select”. Data 320 Assignment 4 Linear Regression 10
[Student Name] After it finishes, click on the link to Open your dashboard in Power BI Service. You might need to sign into your Power BI Service account. Once you are in Power BI Service, click on “Export” and PowerPoint. Data 320 Assignment 4 Linear Regression 11
[Student Name] Be sure to select “Current Values” so that it exports ALL the pages you created into PowerPoint. This will download a PowerPoint file. Creating your letter to future students Now that you have done all the data exploration, data preparation, visualizations, and a linear regression analysis, you are ready to enter your findings into the Assignment 4 Letter to Future Student Template. You have been given a template to use, so make sure to follow all the instructions in that template. Final Instructions on Submitting Your Assignment You will submit three files into the Assignment in LEO: Assignment 4 Letter to Future Student Template Data Exploration and Preparation (this document) showing your work PowerPoint export from Power BI Service so your professor can see your work Data 320 Assignment 4 Linear Regression 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[Student Name] Rubric Module 4 Linear Regression Criteria Pts Exceeds Meets Does not meet Initial Questions 5 Provides insightful and accurate responses to initial questions based on the scenario. Provides reasonable responses to initial questions based on the scenario. Does not provide reasonable responses to initial questions based on the scenario. Data exploration, anomaly detection, and data preparation 10 Provides an insightful and accurate exploration of the raw data. Might apply analysis in creative ways. Correctly performs the data preparation. Provides an accurate exploration of the raw data. Data is only partially prepared. Does not provide an accurate exploration of the raw data. No data preparation is performed. Initial Trend Analysis 5 Provides an insightful and accurate analysis of initial trends. Might apply analysis in creative ways. Provides an accurate analysis of initial trends. Analysis might tend towards the obvious. Does not provide an accurate analysis of initial trends. Visualizations 15 Creates three visualizations that explain the data set. Creates one or two visualizations that explain the data set. Does not create any visualizations that explain the dataset. Regression Analysis 15 Correctly creates a scatterplot with a trendline. Creates a scatterplot, but the axes are not in the correct positions or a trendline is missing. Does not create a scatterplot graph. Interpretation of results 20 Completes the two tables of calculations required for the analysis. Correctly interprets the results of the model. Provides some of the calculations required for interpretation. Misses the interpretation of the model. Does not provide an accurate analysis of the linear regression table. Writing of letter to future student 30 Expresses ideas clearly, concisely, and in a logical order. Main ideas are expressed clearly, although there might be minor issues with grammar, punctuation, or typos that do not impede meaning. Does not present ideas in a clear manner. Submission may be marred by significant grammar or spelling errors. Data 320 Assignment 4 Linear Regression 13