Summary by www.lecturesummary.com: 9. Understanding Experimental Data (cont.) by MIT OpenCourseWare

- 0:00 - 0:45 - Course Introduction and Overview
  - Reading assignment (Chapter 18) for this and subsequent lectures
  - No lecture on Wednesday (Thanksgiving break)
  Living in a Data-Intensive World
  - Increasing time spent dealing with data, often involving writing or hiring code
  - Focus on understanding software for data manipulation, writing such code, and interpreting software output regarding data
  - Beginning with experimental data – "statistics meets experimental science"
  0:45 - 2:28 - Collecting and Processing Experimental Data
  - Process is to perform an experiment (physics, biology, chemistry, sociology, anthropology) to collect data
  - Data types are measurements (lab) or answers (questionnaire)
  - Post-data collection: apply a model or theory to pose questions regarding the data
  - Goal: apply data and model to forecast future expectations or outcomes
  - Construct a computation to provide answers, executing a computational experiment to supplement the physical/social one
  - Example: spring modeling
  2:28 - 4:28 - Linear Springs and Hooke's Law
  - Attention given to linear springs (such as those in laboratory settings)
  - Feature: force to compress or extend varies linearly with distance
  - Affiliated with a spring constant (K) that specifies the amount of force required
  - Examples of K values: slinky (low K, 1 N/m), motorcycle suspension (high K, 35,000 N/m)
  - Newton defined: force to accelerate 1 kg mass 1 m/s²
  - Hooke's Law of Elasticity (Robert Hook, 1676): Force is linearly related to distance (F = -K*d)
  - Negative sign shows that force is opposite direction of displacement (restoring force)
  - Hooke's Law applies to a wide range of springs but is not without limits
  - Fails apart beyond the elastic limit (stretched or squeezed too far)
  - Does not work with all springs (e.g., rubber bands, recurve bows)
  4:28 - 5:48 - Using Hooke's Law (Sample Calculation)
  - Sample: determining rider mass to compress a 35,000 N/m spring by 1 cm
  - Convert distance to meters (1 cm = 0.01 m)
  - Force = K * distance = 35,000 N/m * 0.01 m = 350 Newtons
  - Applying F = ma, where acceleration equals gravity (around 9.8 m/s²)
  - Mass = Force / gravity = 350 N / 9.8 m/s² ≈ 35.68 kg
  - Which is equal to about 79 lbs
  - Shows how Hooke's Law can be used after K has been determined
  5:48 - 7:00 - Experimental Determination of Spring Constant
  - Importance of knowing the spring constant (e.g., atomic force microscopes, deformation of DNA)
  - Routine physics lab experiment: hang spring, attach mass, take displacement measurement
  - Solve K using F = K*d, which is rearranged as K = Force / distance
  - Force = mass * gravity (mass * 9.8 m/s²)
  - Ideal: it would take a single measurement
  - Real world: materials are not perfect, measurements noisy, require multiple trials with varying masses
  7:00 - 8:48 - Dealing with Experimental Data (Plotting)
  - Experimental data from more than one trial (mass vs. displacement)
  - Ideally, data will be linear
    Dealing with Experimental Data (Plotting)
    - Experimental data from more than one trial (mass vs. displacement)
    - Ideally, data will be linear
    - Plotting the data: independent variable (masses) on x-axis, dependent variable (displacement) on y-axis
    - Code walkthrough: reading data from a file, converting data to Pyab arrays with `array` function
    - Benefit of arrays: allows direct mathematical operations on array elements (such as scaling) without explicit loops
    - Plotting the gathered data
    Fitting a Curve to Data and Measuring Goodness of Fit
    - Data does not precisely trace a straight line, suggests noise
    - Objective: fit a line (or curve) to the data to map the underlying relationship in spite of noise
    - Must connect independent (x) variable with dependent (y) variable
    - Require an objective function that quantifies how close the best fit line is to the data points
    - Objective: identify the line/curve that has the smallest objective function (best fit)
    - Quantifying distance from points to line: vertical displacement (y-value difference) is used
    - Reason for using vertical displacement: estimating the dependent (y) value from the independent (x) value, the uncertainty is in the y-direction
    The Objective Function: Least Squares
    - Objective function as the sum of squared differences between the observed (measured) and the predicted (from the fitted curve) y-values
    - Difference = observed_y - predicted_y
    - Squaring the difference:
    - This is referred to as least squares
    - It is in the form of variance multiplied by the number of observations, or average squared error
    - To minimize this expression is to minimize the variance between estimated and measured values
    Finding the Best Fit: Polynomials and Linear Regression
    - To keep the objective function to a minimum, must determine the parameters of the curve (e.g., intercept and slope of a line)
    - Assume the model of the forecast curve is a polynomial in the independent variable (x)
    - A line is a degree 1 polynomial (y = ax + b)
    - A parabola is a 2nd-degree polynomial (y = ax² + bx + c)
    - Linear regression is the method to determine the coefficients of the polynomial that best minimize the sum squared difference
    - Visualization: parameters (A, B for a line) create a multi-dimensional space, the objective function creates a surface over this space, the best fit is the lowest point on this surface
    - Using sum of squares ensures the surface has only one minimum
    - Linear regression discovers this minimum by essentially "walking downhill" down the gradient ("linearly regress")
    Applying `polyfit` in Pyab
    - Pyab library offers `polyfit` function for linear regression
    - `polyfit(x_values, y_values, degree_n)` computes coefficients for optimal degree n polynomial least squares fit
    - Returns a tuple of the coefficients (e.g., (a, b) for degree 1, (a, b, c) for degree 2)
    Fitting a Line (Degree 1) to Spring Data
    - Code `fit_data` illustrates how `polyfit(x_vals, y_vals, 1)` is utilized
    - Retrieves coefficients (a, b) of the best fit line
    - Applies coefficients to calculate anticipated y-values (estimated) for the provided x-values
    - Traces out raw data and the line of best fit
    - Spring constant calculation: K is the negative reciprocal of the slope of the line (a) (K ≈ -1/a)
    - Running code shows an example fit to the spring data
    - Calculated K ≈ 21.5 (from a ≈ 0.46)
    - Visual result shows a pretty good fit to the majority of the data, though some deviation ("funky") at higher values
- Using `polyval` and Fitting Higher Order Polynomials (Example 2)
```
python
polyval(model_coefficients_tuple, x_values)
```
  The `polyval` function in Pyab:
  - Accepts coefficients (result of `polyfit`) and x-values, giving predicted y-values.
  - Benefit: can be used for any polynomial order, making code reusable for other models.
  In this section:
  - Second example data set introduced and plotted.
  - Fitting a line (degree 1) shows a "lousy fit" when visually inspected.
  - Trying a higher order model: fitting a parabola (degree 2).
  - Fitting a quadratic is also a type of linear regression (higher dimensional).
  - Visual result shows the quadratic fit is clearly better than the linear fit for the second data set.
  Evaluating Goodness of Fit (Relative vs. Absolute)
  
  Question: How to objectively determine whose fit is better (beyond eyeballing)?
  - Comparing fits: relative (which one is better than another?) and absolute (how close to optimal?).
  - A measure for relative fit: Average Mean Squared Error (MSE).
  - MSE = sum of squared differences / number of samples.
  - Function `get_average_error` computes this.
  - Comparing MSE for quadratic and linear fits: MSE quadratic is six times smaller than MSE linear.
  MSE is suitable for relative comparison but has the following limitations:
  - Not absolute: does not indicate whether an MSE value is "good" or "bad".
  - Not scale independent: varies with data values.
  Absolute Goodness of Fit: Coefficient of Determination (R²)
  
  Standardized Coefficient of Determination (R²) as a scale-independent measure.
  - R² = 1 - (Model's sum of squared errors / Total sum of squares of the data).
  - Numerator: calculates error of the fit (sum of (observed - predicted)²).
  - Denominator: measures overall variability of the data (sum of (observed - mean of observed)²).
  - R² measures the proportion of variability in the data explained by the model.
  - For linear regression, R² will always be between 0 and 1.
  - R² = 1: Model accounts for all variability (ideal fit).
  - R² = 0: Model accounts for no variability (fit no better than simply using the mean of the data).
  - R² ≈ 0.5: Model explains about half the variability.
  - Objective: Determine a fit with R² value as close to 1 as possible.
  Testing Fits with R² and Multiple Degrees
  
  Code functions `gen_fits` and `test_fits` to test several polynomial degrees and output R².
  - `gen_fits` uses `polyfit` to generate models for a list of degrees.
  - `test_fits` uses `polyval` for prediction and calculates R² for each model.
  - Running with second data set and degrees 1 and 2:
  - Running with higher degrees (2, 4, 8, 16):

LectureSummary Learning App

Lecture Summary

LectureSummary Learning App

Search Button

Explore

Help & Legal

9. Understanding Experimental Data (cont.) by MIT OpenCourseWare

Views

Report

Description

9. Understanding Experimental Data (cont.) by MIT OpenCourseWare

Summary by www.lecturesummary.com: 9. Understanding Experimental Data (cont.) by MIT OpenCourseWare

0:00 - 0:45 - Course Introduction and Overview

Living in a Data-Intensive World

0:45 - 2:28 - Collecting and Processing Experimental Data

2:28 - 4:28 - Linear Springs and Hooke's Law

4:28 - 5:48 - Using Hooke's Law (Sample Calculation)

5:48 - 7:00 - Experimental Determination of Spring Constant

7:00 - 8:48 - Dealing with Experimental Data (Plotting)

Dealing with Experimental Data (Plotting)

Fitting a Curve to Data and Measuring Goodness of Fit

The Objective Function: Least Squares

Finding the Best Fit: Polynomials and Linear Regression

Applying `polyfit` in Pyab

Fitting a Line (Degree 1) to Spring Data

Using `polyval` and Fitting Higher Order Polynomials (Example 2)

Evaluating Goodness of Fit (Relative vs. Absolute)

Absolute Goodness of Fit: Coefficient of Determination (R²)

Testing Fits with R² and Multiple Degrees

LectureSummary Learning App

Lecture Summary

Pages

Translate

LectureSummary Learning App

Search Button

Explore

Help & Legal

9. Understanding Experimental Data (cont.) by MIT OpenCourseWare

Views

Report

Description

9. Understanding Experimental Data (cont.) by MIT OpenCourseWare

Summary by www.lecturesummary.com: 9. Understanding Experimental Data (cont.) by MIT OpenCourseWare

0:00 - 0:45 - Course Introduction and Overview

Living in a Data-Intensive World

0:45 - 2:28 - Collecting and Processing Experimental Data

2:28 - 4:28 - Linear Springs and Hooke's Law

4:28 - 5:48 - Using Hooke's Law (Sample Calculation)

5:48 - 7:00 - Experimental Determination of Spring Constant

7:00 - 8:48 - Dealing with Experimental Data (Plotting)

Dealing with Experimental Data (Plotting)

Fitting a Curve to Data and Measuring Goodness of Fit

The Objective Function: Least Squares

Finding the Best Fit: Polynomials and Linear Regression

Applying `polyfit` in Pyab

Fitting a Line (Degree 1) to Spring Data

Using `polyval` and Fitting Higher Order Polynomials (Example 2)

Evaluating Goodness of Fit (Relative vs. Absolute)

Absolute Goodness of Fit: Coefficient of Determination (R²)

Testing Fits with R² and Multiple Degrees