4-3 Project One Submission

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

240

Subject

Economics

Date

Apr 3, 2024

Type

docx

Pages

12

Report

Uploaded by DukeRiverWren25

Report: Housing Price Prediction Model for D. M. Pan National Real Estate Company Southern New Hampshire University Juanita Onasanya Southern New Hampshire University MAT240: Applied Statistics Ole Forsberg March 27, 2024
Introduction This analysis determines if the square footage of a property is a reliable indication of the market price of that home. It employs a random sample of fifty homes from various counties in the United States, including the square footage of each property and the listed price of each house. We use linear regression to describe the connection between the variables, and the scatter plot illustrates a linear connection connecting each of the variables. The listing price would be considered the dependent variable (y), whereas the square footage would be considered the independent variable (x). The analysis's findings indicate a positive correlation between the square footage of a property and its listing price; however, this correlation is neither strong nor weak. In addition, the presence of outliers in the data may have an impact on the reliability of the model used for forecasting. Data Collection When producing an objective sample that can be used to accurately represent a population, the sampling process used is an essential component. For the purpose of doing this, I used a sampling technique that included assigning a number between 0 and 1 to each county by means of the =RAND function on the computer. Following that, I arranged the numbers in ascending order, starting with the least and working my way up to the biggest. After much deliberation, I chose the initial fifty counties to be utilized as the sample.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
For the purpose of this analysis, the response variable is denoted by the letter (y), which stands for the listed price of properties. In the next step, the square footage of dwellings serves as the predictor variable, and it is denoted by the letter (x). Data Analysis
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
As to Zybooks.com, the histogram of square footage has a right-skewed distribution, indicating that most values are concentrated on the left side of the graph, while the right side contains a tail of low-frequency bins. Since the mean exceeds the median, the tail is pushing the distribution's center to shift towards the right. The presence of outliers on the right-hand side of the distribution may be attributed to the elongated tail of the distribution, indicating that some values deviate significantly from the mean compared to others. This phenomenon is responsible for the elevated standard deviation. The dispersion of data is attributed to the large range of values inside the histogram bins. Exhibits inconsistency but often falls between 1,000 to 4,500 units. The data exhibits a central tendency around the value of 2,000. The histogram depicting the distribution of listing prices has a mostly normal distribution with a small rightward skew. The observation that the mean exceeds the median implies that the central tendency of the distribution is being shifted towards the right due to the presence of a small number of bigger values within the dataset. While there are minor peaks seen between 250,000 and 350,000, the distribution seems to have a somewhat concentrated pattern around 300,000. The data exhibits significant dispersion, ranging from around 140,000 to over 530,000. When considering numbers beyond 500,000, it seems that there are a limited number of outliers located on the right side of the distribution. The mean and median of the listing prices in my sample are below the national average, indicating that the listing prices in my sample are somewhat lower than the norm. The lower standard deviation of my sample compared to the national standard deviation indicates that the data in my sample is less dispersed than the data of the whole country. The observed values of the national listing prices indicate a potentially right- skewed distribution, characterized by outliers located towards the upper end of the range.
This may be attributed to the fact that the median is smaller than the mean. The dataset exhibits a wide spectrum of values, as seen by the range (ranging from the maximum to the minimum), which amounts to $852,300. In this case, there are a few outliers with high values that are located above the top whisker (highest value: $987,600). The mean and median values of the square footage in my sample exhibit a minor deviation from the national values. This observation implies that the listing prices in my sample are somewhat below the average and middle of the listing prices. The standard deviation of my sample is small compared to the national standard deviation. This suggests that the variability of my sample data is lower compared to the national data. Considering that the median is less than the mean, it is possible that the distribution of national square feet has a rightward tilt. The standard deviation of the national square foot is 921, suggesting a significant degree of variability in the sizes of residences. The standard deviation, on average, is 921 square feet less than the norm. It is possible to determine the interquartile range of the national square foot by subtracting the first quartile from the third quartile, which yields a value of 589 at the end of the calculation. Based on this information, it seems that the size of the middle fifty percent of the homes falls somewhere in the range of 1,626 to 2,215 square feet total. Due to the fact that the dataset comprises a minimum value of 1,101 and a maximum value of 6516, it is probable that the dataset contains outliers. In general, the comparisons indicate that the sample under examination has a considerably smaller number of mean/median traits and a decreased degree of variability when compared to the national population. This is the conclusion that results from the comparisons. In spite of this, the numbers that I have collected from my sample are rather similar to those that are presented by the whole
nation. As a result, I am of the opinion that there is a potential that my sample could properly represent the whole population in terms of the sales that occur on the national housing market. Develop Regression Model The line that depicts the trajectory The act of traversing the data points graphically highlights the connection between the data sets by enabling one to determine whether the data points closely align with the line. Based on the observed pattern indicating a linear relationship between the response variables and the predictor variables, it is evident that using a regression model is the most appropriate course of action for our study. The scatter plot illustrates that the values of the square footage of residences (x) exhibit a positive correlation with the The variable x represents the quoted price of the residence, whereas the variable y denotes the square footage of the house. The regression equation is the formula used to estimate the value of the response variable based on the value of the predictor variable. The slope may be defined as the quotient of the difference between the observed changes in y-values at any two locations on the trend line and the corresponding changes in x- values at those same positions. The equation 72.9x + 180530 illustrates the rate of change of the listing price in proportion to the square footage of the property, as shown by the slope of the line (72.9). The
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
slope of the line indicates the anticipated change in sales price for each additional square footage unit added to the listing. For illustrative purposes, it is possible to predict that the listing price of a property will increase by $72.90 for each additional square footage of the dwelling. In the scenario when x is equivalent to zero, the value of y corresponds to the intercept of the line. In this specific instance, the intercept of 180,530 signifies the projected listing price of a property under the condition that the square footage of the residence is equal to zero. Consequently, it is evident that the listed price would stay constant at $180,530, irrespective of the property's square footage or its status as just land. The coefficient of determination, often known as the R-squared statistic, is a statistical metric used to quantify the degree to which the predictor variable accounts for the variability seen in the responder variable. As stated on Zybooks.com, the computation primarily assesses the extent to which the regression line aligns with the observed data pattern. When examining this particular mode, the calculated value of r squared is 0.439. This suggests that the variation in the size of houses may explain about 43.9% of the variability in the quoted price of residences. The remaining 56.1% of the variation may be attributed to factors that were not included in the model and hence cannot be explained. The regression equation for my sample is y = 72.9x + 180530. If I own a residence of 1,500 square feet, I will use the regression equation to determine the price at which I would propose it. By substituting the number 1,500 into the equation y = 72.9(1500) + 180530, I determined that y is equal to 289,880. As a result, the first listed price for a home of 1,500 square feet would amount to $289,880. Conclusion
By using a random sample of fifty houses from different counties in the United States, I have determined that the size of a home may influence the listing price of a property to some extent. By using the regression equation y = 72.9x + 180530, it is possible to provide a precise forecast about the listing price of a residential property by taking into account the square footage of this property. However, it is conceivable that the results may not accurately represent the published prices in every situation. The relationship between the predictor variable and the responder variable was investigated using a scatterplot, revealing a positive correlation between the two. However, the relationship is neither excessively robust nor too weak. The histogram's results suggest a potential skewness in the data, which might potentially affect the effectiveness of a prediction model built using this dataset. In order to make an estimate of the listing price, it is important to consider the presence of many high figures or outliers that might possibly introduce interference. However, there is still a prevailing pattern indicating that the size of a property's square footage is a determinant of the home's listing price. Given the typical tendency for larger houses to incur higher costs, the results align precisely with my initial expectations. The presence of outliers may be attributed to the existence of exceptions to this rule. Several variables may influence the cost of a property, such as its geographical position, renovations, and characteristics. In order to enhance the accuracy and representativeness of the data pertaining to a representative house in the United States, it is recommended to omit property values that are deemed to deviate significantly from the norm. I would like to investigate an issue that is intriguing: How does this model compare to houses? situated in affluent neighborhoods? I am optimistic that this would enhance the process of identifying the constituent elements inside specific residences that lead to the substantial escalation in price.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help