The presenter indicates that the equation σ_mean ~ σ_x / √N gives an optimistic estimate of uncertainty when N is small.
The CLT may be extended to correlated variables, producing various formulas incorporating covariance. The covariance must also converge using enough samples.
Applications
20:00 - End (approx.) - Applications: Monte Carlo and Model-Data Comparison
Formulas analogous to the mean and variance of a variable can be used in the average value of a function of variables.
The mean value uncertainty of a function also decreases with the number of trials (N), with the same 1/√N scaling when N is large.
Convergence Properties
The convergence properties under consideration give rise to two main applications:
- Model comparisons vs. experiments (main focus for this group).
- Numerical integration with Monte Carlo methods.
Monte Carlo Methods
Principle: To approximate an integral of a function f(x) weighted by a PDF px(x) (i.e., the average of f(x)), it is possible to generate samples of x from px(x) and take the mean of the resulting values of f(x).
The average of these sampled f(x) values will tend towards the correct integral value, and the error in this average reduces as 1/√N.
Monte Carlo methods are random (stochastic) and may be simple to program. They can be efficient, especially when dealing with high-dimensional integrals where deterministic approaches (such as quadrature) are extremely challenging.
Comparing Models vs Data
Shift to the main subject: Comparing Models vs Data.
Experimental Setup
Experimental setup is defined in terms of:
- Knobs (X): Experiment controls (e.g., valve settings, temperature conditions, chemicals).
- Parameters (θ): Factors that impact the experiment but aren't under the experimenter's control (e.g., rate coefficients, molecular weight, apparatus length). Parameters can be very certain or very uncertain, but they're all fixed once set.
- Measurables (Y): The experimental results obtained.
A model predicts the measurable values Y as a function of the knobs X and the parameters θ. This model can be computationally complex (e.g., involve differential equations).
Experimental data (Y_data) are acquired by adjusting the knobs X.
- Y_data will typically not exactly agree with the model prediction because of experimental error.
- There are usually several experimental results for a given setting of the knobs because of repetitions.
A problem is that the model itself may be faulty, though the initial presumption will be that the model is good and the problem lies with the parameter values.
Dealing with Discrepancies
How to deal with discrepancies between model and data:
- One very popular concept is to reduce a measure of the error by setting the model parameters (θ).
- The usual measure to reduce is the sum of the squared deviations between the observed data and the prediction of the model.
- These squared differences are usually weighted with the variance of the measurements for each point. The format is Sum [(Y_data - Y_model)^2 / variance].
- This particular type is inspired by the fact that reduction of this amount is equivalent to maximizing the probability (or chance) of having seen the experimental data, given the deviations are Gaussian distributed scaled by variance.
- Rescaling parameters to minimize this sum compels the model to more closely comply with the experiment.
The following lecture discusses the actual procedures for carrying out this minimization. Suggested resources include course notes and the text.