Using `polyval` and Fitting Higher Order Polynomials (Example 2)
python
polyval(model_coefficients_tuple, x_values)
The `polyval` function in Pyab:
- Accepts coefficients (result of `polyfit`) and x-values, giving predicted y-values.
- Benefit: can be used for any polynomial order, making code reusable for other models.
In this section:
- Second example data set introduced and plotted.
- Fitting a line (degree 1) shows a "lousy fit" when visually inspected.
- Trying a higher order model: fitting a parabola (degree 2).
- Fitting a quadratic is also a type of linear regression (higher dimensional).
- Visual result shows the quadratic fit is clearly better than the linear fit for the second data set.
Evaluating Goodness of Fit (Relative vs. Absolute)
Question: How to objectively determine whose fit is better (beyond eyeballing)?
- Comparing fits: relative (which one is better than another?) and absolute (how close to optimal?).
- A measure for relative fit: Average Mean Squared Error (MSE).
- MSE = sum of squared differences / number of samples.
- Function `get_average_error` computes this.
- Comparing MSE for quadratic and linear fits: MSE quadratic is six times smaller than MSE linear.
MSE is suitable for relative comparison but has the following limitations:
- Not absolute: does not indicate whether an MSE value is "good" or "bad".
- Not scale independent: varies with data values.
Absolute Goodness of Fit: Coefficient of Determination (R²)
Standardized Coefficient of Determination (R²) as a scale-independent measure.
- R² = 1 - (Model's sum of squared errors / Total sum of squares of the data).
- Numerator: calculates error of the fit (sum of (observed - predicted)²).
- Denominator: measures overall variability of the data (sum of (observed - mean of observed)²).
- R² measures the proportion of variability in the data explained by the model.
- For linear regression, R² will always be between 0 and 1.
- R² = 1: Model accounts for all variability (ideal fit).
- R² = 0: Model accounts for no variability (fit no better than simply using the mean of the data).
- R² ≈ 0.5: Model explains about half the variability.
- Objective: Determine a fit with R² value as close to 1 as possible.
Testing Fits with R² and Multiple Degrees
Code functions `gen_fits` and `test_fits` to test several polynomial degrees and output R².
- `gen_fits` uses `polyfit` to generate models for a list of degrees.
- `test_fits` uses `polyval` for prediction and calculates R² for each model.
- Running with second data set and degrees 1 and 2:
- R² for linear fit is horrible (< 0.005).
- R² for quadratic fit is good (~0.84).
- R² confirms quadratic is much better.
- Running with higher degrees (2, 4, 8, 16):
- R² values rise with degree (2 ≈ 0.84, 4 slightly higher, 8 slightly higher).
- Degree 16 is very high R² (≈ 0.97), explaining nearly 97% variability.