27. Probability Theory 2 by MIT OpenCourseWare

Description

27. Probability Theory 2 by MIT OpenCourseWare

Summary by www.lecturesummary.com: 27. Probability Theory 2 by MIT OpenCourseWare


  • 0:00 - 3:00 (approx.) - Course Administration and Homework Discussion

    • Greeting and recognition of the Creative Commons license for the material.
    • Reference to the previous homework problem.
    • Mesh size required for convergence was stated to be of order L.
    • Comparison of student solutions with a console solution.
    • Solutions were said to be in the range of 3 or 4%.
    • Discussion regarding manipulating million by million matrices in programs such as MATLAB.
    • MATLAB is able to support large sparse allocations, which is "pretty amazing".
    • Opportunity for additional questions regarding the homework problem is provided.

    3:00 - 5:00 (approx.) - Introduction to Probability Formulas

    • Review of fundamental probability, particularly the equation for the probability of A or B (P(A or B)).
    • Equation is P(A) + P(B) - P(A and B).
    • Conditional probabilities are introduced.
    • Expression of P(A and B) as P(A) * P(B given A).
    • Equivalence to P(B) * P(A|B).
    • Concept of conditional probability means that one event has already occurred when inquiring about the probability of another.

    5:00 - 11:00 (approx.) - Polymer Example and Flory Distribution

    • Example from Beers' textbook of polymer chemistry is presented to demonstrate probability ideas.
    • Polymers are produced from reacting monomers.
    • Monomers contain acceptor (A) and donor (D) groups that bond to each other.
    • Statistics of polymer chain lengths (e.g., weight percent, average molecular weight) are discussed.
    • Approach: Imagine a random polymer chain beginning from an unreacted D end.
    • Probability of chain length is derived from the probability of reaction (P).
    • Monomer probability is similar to 1 - P.
    • Dimer likelihood is P * P(next unreacted | closest reacted).
    • Probability of Monomer = 1 - P.
    • Probability of Dimer = P * (1 - P).
    • Probability of Trimer = P^2 * (1 - P).
    • Chance of being an N-mer (an N-unit chain) is P^(N-1) * (1 - P).
    • Expectation value (average) of the chain length is determined.
    • Concentration of oligomers (polymers with n units) is discussed.
    • Flory distribution is introduced, holding when reactions are uncorrelated and no loops are present.
    • Width of the distribution is significant for maximum properties.
    • Optimal chain length for an application is discussed.
    • Dispersion about the optimum value is important.
    • Variance of the chain length distribution is specified.
    • Standard deviation is the square root of the variance (σ_n).
    • Dispersity appears in polymer science: σ_n / E[N].
    • Probability summation is noted.
    • Contrast between probability/number average and weight percent is discussed.
    • Weight percent takes into account the mass contribution per chain length.
    • Common practice in polymer science is to state whether a number average or weight average was employed.

    11:00 - 15:00 (approx.) - Probability for Continuous Variables and Sampling

      • Shift from discrete variables (chain length) to continuous variables.
      • Introducing Probability Density Functions (PDFs) for continuous variables.
      • The chance that a variable X lies in an interval [x, x + dx] is provided by px(x) * dx, where px(x) is the PDF.
      • PDFs have units of 1 / units of the variable (e.g., 1/meter for height). Multiplying by dx (which has units of x) makes the probability dimensionless.
      • The integral of the PDF over all possible values of x must equal one (normalization).
      • Integrals are used to compute mean and variance for continuous variables rather than sums.
      • Mean (E[X]): Integration of x * px(x) * dx.
      • Variance (σ_x^2): Integration of x^2 * px(x) * dx - (E[X])^2.
      • The mean value of any function f(x) is the integral of f(x) * px(x) * dx.
      • The variance of a function f(x) is also obtained through integration.
      • Concept of sampling from a PDF.
      • Drawing a value of x from a PDF means values where px(x) is high are more likely to be selected.
      • This can be done mathematically using random number generators.
      • Experimental measurements are essentially sampling from a PDF. Each measurement samples a value from an underlying distribution of potential outcomes, even if the experimenter is unaware of the shape of the PDF.
      • Dealing with multiple variables measured at the same time.
      • This needs a multivariable PDF (e.g., px(x1, x2, .)).
      • Experimentalists typically measure several things (e.g., flow rate and temperature).

      Analysis of Experimental Data and the Central Limit Theorem

      15:00 - 20:00 (approx.)

      • Emphasis on repeated measurements (experiments).
      • Repeats of the same measurement will vary (e.g., height, weight, blood pressure).
      • The experimental mean is calculated.
      • It's assumed that when the number of experiments (N) increases to infinity, the experimental average approaches the true value (the mean of the underlying PDF). This is described as an "article of faith" or assumption.
      • Formal introduction of the Central Limit Theorem (CLT).
      • This is a "sensational theorem" in statistics.
      • What the CLT says: As the number of trials (N) becomes very large, the distribution of the mean of the readings converges to a Gaussian (bell curve), independent of the form of the original distribution px(x).
      • The theorem is valid if the trials are independent and the variables themselves are independent (covariance matrix is diagonal).
      • A key implication: The variance of the mean (σ_mean^2 or σ_avg^2) goes inversely with the number of samples: σ_mean^2 tends to 1/N * σ_x^2 (where σ_x^2 is the variance of the individual measurements).
      • So, the standard deviation (uncertainty) in the mean scales as 1/√N.
      • Significance of the CLT for experimentalists.
      • The uncertainty in the average measurement gets smaller and smaller as the number of repeated experiments increases (proportional to 1/√N).
      • The variation of the individual measurements (σ_x^2) converges to a fixed value (the true variance) as N gets larger; it does not converge to zero.
      • Practical issue with the CLT: The theorem asks that N be "large," but does not state how large.
      • Experimentalists tend to carry out a limited number of repeats (e.g., 9 observations for the initial report of Higgs boson discovery).
      • Applying Gaussian equations (derived from the CLT) with small N can result in misestimating the accuracy of results, usually underestimating the actual uncertainty.
      • Small samples will not necessarily have a Gaussian distribution, and the "tails" (low probability events) of the actual distribution may be significant but are not well represented.
      • The presenter indicates that the equation σ_mean ~ σ_x / √N gives an optimistic estimate of uncertainty when N is small.

        The CLT may be extended to correlated variables, producing various formulas incorporating covariance. The covariance must also converge using enough samples.

        Applications

        20:00 - End (approx.) - Applications: Monte Carlo and Model-Data Comparison

        Formulas analogous to the mean and variance of a variable can be used in the average value of a function of variables.

        The mean value uncertainty of a function also decreases with the number of trials (N), with the same 1/√N scaling when N is large.

        Convergence Properties

        The convergence properties under consideration give rise to two main applications:

        • Model comparisons vs. experiments (main focus for this group).
        • Numerical integration with Monte Carlo methods.

        Monte Carlo Methods

        Principle: To approximate an integral of a function f(x) weighted by a PDF px(x) (i.e., the average of f(x)), it is possible to generate samples of x from px(x) and take the mean of the resulting values of f(x).

        The average of these sampled f(x) values will tend towards the correct integral value, and the error in this average reduces as 1/√N.

        Monte Carlo methods are random (stochastic) and may be simple to program. They can be efficient, especially when dealing with high-dimensional integrals where deterministic approaches (such as quadrature) are extremely challenging.

        Comparing Models vs Data

        Shift to the main subject: Comparing Models vs Data.

        Experimental Setup

        Experimental setup is defined in terms of:

        • Knobs (X): Experiment controls (e.g., valve settings, temperature conditions, chemicals).
        • Parameters (θ): Factors that impact the experiment but aren't under the experimenter's control (e.g., rate coefficients, molecular weight, apparatus length). Parameters can be very certain or very uncertain, but they're all fixed once set.
        • Measurables (Y): The experimental results obtained.

        A model predicts the measurable values Y as a function of the knobs X and the parameters θ. This model can be computationally complex (e.g., involve differential equations).

        Experimental data (Y_data) are acquired by adjusting the knobs X.

        • Y_data will typically not exactly agree with the model prediction because of experimental error.
        • There are usually several experimental results for a given setting of the knobs because of repetitions.

        A problem is that the model itself may be faulty, though the initial presumption will be that the model is good and the problem lies with the parameter values.

        Dealing with Discrepancies

        How to deal with discrepancies between model and data:

        • One very popular concept is to reduce a measure of the error by setting the model parameters (θ).
        • The usual measure to reduce is the sum of the squared deviations between the observed data and the prediction of the model.
        • These squared differences are usually weighted with the variance of the measurements for each point. The format is Sum [(Y_data - Y_model)^2 / variance].
        • This particular type is inspired by the fact that reduction of this amount is equivalent to maximizing the probability (or chance) of having seen the experimental data, given the deviations are Gaussian distributed scaled by variance.
        • Rescaling parameters to minimize this sum compels the model to more closely comply with the experiment.

        The following lecture discusses the actual procedures for carrying out this minimization. Suggested resources include course notes and the text.