11/15/23, 1:22 AM
17.6. Multiple Regression — Computational and Inferential Thinking
https://inferentialthinking.com/chapters/17/6/Multiple_Regression.html
7/12
17.6.2.2. Interpreting Multiple Regression
Let’s interpret these results. The best slopes give us a method for estimating the price of a
house from its attributes. A square foot of area on the first floor is worth about $75 (the first
slope), while one on the second floor is worth about $70 (the second slope). The final negative
value describes the market: prices in later years were lower on average.
The RMSE of around $30,000 means that our best linear prediction of the sale price based on all
of the attributes is off by around $30,000 on the training set, on average. We find a similar error
when predicting prices on the test set, which indicates that our prediction method will
generalize to other samples from the same population.
If the predictions were perfect, then a scatter plot of the predicted and actual values would be a
straight line with slope 1. We see that most dots fall near that line, but there is some error in the
predictions.
1st Flr
SF
2nd Flr
SF
Total
Bsmt
SF
Garage
Area
Wood
Deck SF
Open
Porch
SF
Lot Area
Year
Built
Yr Sold
68.7068
74.3857
56.0494
36.1706
26.4397
21.4779
0.558904
534.101
-528.216
RMSE of all training examples using the best slopes: 29311.117940347867
test_prices
=
test
.
column(
0
)
test_attributes
=
test
.
drop(
0
)
def
rmse_test
(slopes):
return
rmse(slopes, test_attributes, test_prices)
rmse_linear
=
rmse_test(best_slopes)
print
(
'Test set RMSE for multiple linear regression:'
, rmse_linear)
Test set RMSE for multiple linear regression: 33025.064938240575
def
fit
(row):
return
sum
(best_slopes
*
np
.
array(row))
Skip to main content