28. Models vs. Data 1 by MIT OpenCourseWare

Description

28. Models vs. Data 1 by MIT OpenCourseWare

Summary by www.lecturesummary.com: 28. Models vs. Data 1 by MIT OpenCourseWare


  • Introduction to Bayesian and Prior Estimation

    • Beginning with the assumption of absolute certainty in a model and parameters.
    • Designing experiments with repeated measurements of an observable predicted by the model.
    • Noticing that repeated measurements are different and not precisely equal to the expected value.
    • Asking the question: What is the chance of finding a particular measurement value?

    Probability of Observing a Measurement

    • Describing the chance that an experimental measurement lies in a narrow interval.
    • This chance is associated with an integral with respect to the probability density function.

    Probability Density Function of the Mean Measurement

    • The distribution of the mean measure is to be convergent to a Gaussian distribution for sufficiently large repeats (by logics of the lognormal Real World and the lognormal Real World's Central Limit Theorem).
    • Showing the formula for the probability density for seeing an experimental mean value *y*, assuming the model is correct and parameters known: p(y | model, theta, x) = (1 / (sigma * sqrt(2 pi))) * exp(-0.5 * ((y_exp - y_model) / sigma)^2). (Comment: Integral form reduced to the density function times dy.)
    • Pointing out the significance of this equation as pivotal in the lecture section.

    Question regarding Sigma

    • A question is raised regarding the definition of sigma in the formula for probability density.

    Definition of Sigma and Mean's Variance

    • Explaining that sigma in the formula is the standard deviation of the mean, and not the standard deviation of single measurements (sigma_y).
    • Describing how the variance of individual measurements is related to the variance of the mean: Variance of the mean = sigma_y^2 / n, and therefore sigma (standard deviation of the mean) = sigma_y / sqrt(n). The lecturer says the variance of the mean is (1/n) * sigma_y, which is incorrect notation for standard deviation but right for variance if sigma_y is the standard deviation of individual measurements.
    • The greater the number of repeats (n), the lower the uncertainty in the mean (sigma) due to the 1/sqrt(n) dependence (or 1/n dependence for variance).
    • Explaining how the running average of measurements gets closer to a value with increasing n, and the uncertainty of this average gets smaller.

    Use Case 1: Model Validation

    • Employing experiments to establish confidence in or validate a model and its parameters, particularly to skeptics.
    • One common, qualitative technique is plotting experimental data points versus the model curve and determining whether or not they "look good" (e.g., a parity plot).
    • The quantitative method is to apply the probability formula to determine how probable the observed data is, assuming the model and parameters. This is usually challenging and omitted in practice.

    Use Case 2: Refuting a Model or Parameters

    • This entails demonstrating that the observed experimental data is extremely unlikely if the model, parameter settings, and experimental conditions are all assumed perfectly accurate.
    • A low chance of seeing the data indicates that something is amiss (the model, the parameters, the knobs on the experiment, or the measurements).
    • Writing papers refuting common beliefs is usually fun but will be extremely likely to get retracted if based on experimental mistakes or wrong interpretation.
    • Relaxing assumptions comes next.

    Use Case 3: Parameter Refinement

    • Relaxing the case that parameter values are known exactly.
    • Employing experimental data (Y) to make an inference or improve the values of unknown parameters (theta).
    • Discussing two primary perspectives on this: Least Squares Fitting and the Bayesian Perspective.
    • Defining Bayes' Theorem in terms of conditional probability: P(A and B) = P(A) * P(B|A) = P(B) * P(A|B).
    • Using Bayes' Theorem on measurements (Y) and parameters (theta): P(Y | theta) = P(Y) * P(theta | Y) / P(theta).

    • Bayes' Theorem and Parameter Estimation

      • Highlighting the outcome of interest: the probability of parameter values based on the measurements, P(theta | Y).
      • Expressing the terms:
      • P(theta | Y): The posterior distribution of the parameters after the experiment.
      • P(theta): The prior distribution of the parameters before the experiment.

      The Likelihood Function

      • P(Y | theta): Probability of seeing the data Y under parameter settings theta. This is the likelihood function.

      Determining Prior and Posterior

      • Sorting out which in Bayes' Theorem is the prior (P(theta)) and which is the posterior (P(theta | Y)).

      Right Notation and The Prior

      • Fixing notation to employ probability density functions (lowercase 'p') for continuous variables.
      • The proper equation is p(theta | Y) = p(theta) * p(Y | theta) / p(Y).
      • Expanding on the prior p(theta): It is the original information or distribution of belief regarding the parameter values prior to the experiment.

      Calculating the Posterior

      • The posterior distribution p(theta | Y) is equal to the product of the prior p(theta) and the likelihood p(Y | theta).
      • The denominator p(Y) is a normalizing constant.
      • This procedure brings together the experimental data (likelihood) and all prior data (prior).

      Results of Bayesian Analysis

      • A well-designed, accurate experiment will yield a narrow likelihood function, resulting in a sharp posterior distribution.
      • Even if the experiment is not ideal, the posterior distribution is generally tighter than the prior.
      • The central value and uncertainty range can be extracted from the posterior distribution.
      • A practical difficulty is that models tend to have multiple parameters.

      Multi-dimensional Parameter Space

      • Displaying the Bayesian process with two parameters (e.g., K and theta2).
      • The prior is a multi-dimensional distribution.
      • The likelihood of the experiment is a multi-dimensional distribution.
      • The posterior is the product of prior and likelihood.

      Parameter Space Contraction and Least Squares

      • Experiments limit the range of parameter values consistent with the data.
      • The aim is to contract the parameter space.
      • Relationship to Least Squares Fitting: The least squares objective function stems from the exponent of the Gaussian probability distribution.
      • Reducing the exponent on the likelihood is the same as maximizing the probability.

      Least Squares Fitting Details

      • For least squares, parameters are optimized to reduce the weighted squared differences.
      • Least squares generally needs more independent data points.
      • Experiments are best performed at various conditions.

      Dealing with Multiple Data Points and Covariance

      • Introduce multiple observables measured at multiple experimental conditions.
      • Every measurement is associated with a variance of the mean.
      • Recognizing that errors between measurements may be correlated.
      • The model calculates corresponding values for every observable at every condition.

      Multi-variate Probability Density and Residuals

      • Letting the residual vector be the difference between observed data and model prediction.
      • The probability density of observing the residuals is a multi-variate Gaussian formula.
      • This equation is the multi-measurement version of the previous probability density equation.
      • Least Squares from Multi-variate Gaussian

        • Fancy method: Optimize C as well as theta, though this is computationally involved.
        • Standard practice: Calculate C experimentally and consider it as a constant.
        • Constant C: Maximizing the probability density (the multi-variate Gaussian) is the same as minimizing the exponent term: epsilon transpose C inverse epsilon.
        • General form: This is the general form of least squares fitting.
        • Uncorrelated errors: Least squares could have uncorrelated errors (diagonal C) or equal errors (C is identity).
        • Key consideration: The choice of which parameters (theta) to vary depends critically on the problem. Varying known parameters can produce unphysical solutions. The Bayesian concept of a prior determines which parameters have strong constraints and should be kept fixed.

        Parameter Selection and Transformation

        • Adjustable parameters: Frequently, only some parameters are regarded as adjustable in least squares since others can be well-determined by prior knowledge or other measurements.
        • Parameter Transformation: Occasionally, an experiment is more suited to determining a set of parameters rather than single parameters.
        • Example: An A <=> B equilibrium reaction.
        • Equilibrium measurements: Yield the ratio KF/KR (the equilibrium constant).
        • Time constant: The sum KF+KR is determined by short-time kinetics measurements.
        • Linear combination: One experiment may determine a linear combination such as W1 = KF + KR exactly, even though KF and KR themselves would be ill-determined from that experiment.
        • Parameter shifting: Shifting to parameters (e.g., W1 = KF+KR and W2 = KF/KR) enables fixing poorly determined parameters from this experiment (e.g., W2, potentially fixed from thermodynamics) while fitting the determinable combination (W1).

        Comparing Perspectives

        • Least Squares: Looks for the "best fit" parameters, needing sufficient data to figure out the parameters being altered.
        • Bayesian Perspective: Concentrates on reducing the uncertainty range over the multi-dimensional parameter space; it can deliver updated information even if no individual parameter is uniquely defined through the experiment by itself. The result is reported as a multi-dimensional probability distribution.