6. Singular Value Decomposition; Iterative Solutions of Linear Equations by MIT OpenCourseWare

Description

6. Singular Value Decomposition; Iterative Solutions of Linear Equations by MIT OpenCourseWare

Summary by www.lecturesummary.com: 6. Singular Value Decomposition; Iterative Solutions of Linear Equations by MIT OpenCourseWare


  • 0:00 - Introduction and Recap

    • Last lecture in linear algebra, where we discussed singular value decomposition (SVD).
    • Reviewing prior topics: basics and transformations of matrices.
    • Analogy between vectors in finite dimensions and functions in infinite dimensions.
    • Analogy between matrices/transformations and differential operators.
    • Introducing the eigenvalue/eigenfunction problem in function space, as an example using the second derivative operator.
    • Establishing boundary conditions for the differential operator.

    1:30 - Solving the Eigenvalue/Eigenfunction Problem Example

    • Slogging through the example of calculating eigenfunctions and eigenvalues for the second derivative operator with given boundary conditions.
    • Considering potential solutions: exponentials and trigonometric functions.
    • Trigonometric functions are more appropriate in this situation.
    • Applying the boundary condition y(0) = 0 to solve for a coefficient.
    • Applying the second boundary condition y(L) = 0 to solve for the eigenvalues.
    • The connection between the boundary condition and the equivalent of the secular or characteristic polynomial in linear algebra.
    • Finding the eigenvalues: a list of numbers - (nÏ€/L)^2 for n = 1, 2, 3. There are an infinite number of eigenvalues.
    • The eigenvectors (eigenfunctions) are scalar multiples of sin(sqrt(-λ) * x).
    • Verification of the one-to-one correspondence between linear algebra concepts and linear differential equations.
    • Reference to orthogonal functions utilized to describe differential equation solutions.

    3:30 - Applications of Eigenvalue/Eigenfunction Concepts

    • Traditional chemical engineering illustration: quantum mechanics, wave functions, and corresponding energy levels to eigenvalues.
    • Mechanical analog: the buckling of an elastic column (e.g., spaghetti).
    • The balance of linear momentum and bending moment leads to a differential equation.
    • Buckling occurs when the applied pressure exceeds the first eigenvalue associated with this equation.
    • The Eiffel Tower itself is a case of a building designed against buckling with the help of this principle.

    5:50 - Introduction to Singular Value Decomposition (SVD)

    • Summary of Eigen decomposition for square matrices.
    • Posing the question of what if the matrix A is not square (n x m, where n ≠ m).
    • Defining Singular Value Decomposition as the corresponding decomposition for non-square matrices.
    • SVD form: A = U * Sigma * V^dagger.
    • Definition of the dagger symbol: conjugate transpose (transpose and take complex conjugate).
    • Discussion of the properties of the matrices in the decomposition:
      • U: N x N square matrix, maps from R^n to R^n (or C^n to C^n).
      • Sigma: N x M matrix, contains real, positive diagonal elements (singular values). It is not square, but only has non-zero elements on the main diagonal.
      • V: M x M square matrix, maps from R^m to R^m (or C^m to C^m).
      • U and V are referred to as the left and right singular vectors respectively.

    6:55 - Existence and Properties of SVD

    • Query about which matrices have SVD.
    • Every matrix has a singular value decomposition. This is in contrast to Eigen decomposition, which may not always exist for every matrix (e.g., when eigenvectors are degenerate).
    • U and V properties: They are unitary matrices.
    • For real matrices, their transpose is their inverse (U^T U = I, V^T V = I).
    • For complicated matrices, their conjugate transpose is their inverse (U^dagger U = I, V^dagger V = I)
    • unitary matrices do not add stretch to vectors; they are similar to rotation matrices.

      • Connection between SVD and the Eigen decomposition of A * A and A * A.
      • V is the collection of eigenvectors of A * A.
      • U is the collection of eigenvectors of A * A.
      • Sigma2 has the eigenvalues of A * A and A * A.
      • Sigma values are precisely referred to as the singular values of A.

      Applications of SVD: Null Space and Range

      • Illustrating SVD calculation (e.g., in MatLab) using examples.
      • SVD offers immediate access to the null space and the range of a matrix A.
      • The columns of V that lie in the zero columns of Sigma cover the null space of A.
      • The rows of U that lie in the non-zero columns of Sigma cover the range of A.
      • Examples showing this for a 4x3 and a 3x4 matrix.

      Applications of SVD: Data Compression

      • Using SVD on a data matrix representation (e.g., a bitmap image such as a fingerprint).
      • The size of the singular values reveals the content of information they contain; big values are more significant.
      • Data compression is possible by ignoring singular values of size less than a threshold value and their respective singular vectors.
      • Example: compressing a 300x300 pixel fingerprint by retaining only the 50 largest singular values, resulting in a faithful reconstruction.

      Uses of SVD: Solving Linear Equations (Least Squares)

      • Using SVD to find the least squares solution to the equation Ax = b, especially when A is non-square.
      • It is another way besides the normal equations approach (AT A x = AT b) that can be used only with over-specified systems.
      • The problem of least squares is to determine the vector x that minimizes the norm of the error vector ||Ax - b||.
      • Replacing A = U Sigma V into ||Ax - b||.
      • Utilizing the fact that unitary matrices (U) leave the 2-norm of a vector unchanged.
      • The issue simplifies to minimizing ||Sigma * y - p||, where y = V * x and p = U * b.

      Least Squares Solution Derivation

      • Let r be the number of non-zero singular values (which is also the rank of A).
      • The expression to minimize can be written as a sum of squared terms involving the diagonal elements of Sigma, y, and p.
      • The objective is to choose the values of y that minimize this sum.
      • For those terms in which the corresponding singular value σi is not zero, the best choice for yi is yi = pi / σi.
      • If rank r is equal to n (the number of rows of A), then there are no remaining terms, and yi = pi / σi provides the exact solution.

      Underdetermined Systems and Minimum Norm Solution

      • Speaking about the scenario when r+1 is less than m (number of columns in A), which represents an underdetermined system.
      • There are y components (from r+1 to m) that cannot be determined by the equations.
      • External information must be used to select values for these undefined y components.
      • One standard procedure is to put these additional components of y equal to zero, resulting in the minimum norm least squares solution.
      • Having determined y (partially specified by p and partially by convention), x can be calculated from x = V * y.

        SVD Overview

        • SVD enables solving both overdetermined and underdetermined least squares problems.

        Computational Complexity

        • Computation of SVD is costly, approximately order n^3.
        • This is similar to exact methods such as Gaussian elimination.
        • For extremely large problems, O(n^3) computational complexity is usually too high.

        Introduction to Iterative Solutions

        • For cases where exact techniques (such as Gaussian elimination or SVD) are too costly computationally for large matrices, approximate iterative methods are utilized.
        • Iterative methods will attempt to arrive at a solution with a tolerance specified.
        • Such algorithms iteratively improve an initial guess (x_i) using a linear map to obtain a more accurate guess (x_{i+1}).
        • The map is generally of the type: x_{i+1} = C * x_i + c.
        • The process converges if the iterations get closer to a constant value (x_{i+1} ≈ x_i).
        • If the process converges, the converged value is a solution to the original system Ax = b.
        • The key is choosing C and c such that the map converges to the desired solution.

        Jacobi Iteration

        • An example of an iterative method: Jacobi iteration.
        • The strategy is to split matrix A into a diagonal part (D) and an off-diagonal part (R) (A = D + R).
        • The original equation Ax = b is converted into an iterative map: x_{i+1} = D^-1 * (-R * x_i + b).
        • Jacobi iteration converts an O(n^3) problem to a series of O(n) problems (as D is diagonal, D^-1 is simple to calculate).
        • Convergence is required for this to be a feasible process.

        Convergence Analysis

        • How to determine if an iterative method will converge.
        • Reducing the iterative equation from the exact solution x in order to examine the error (error_{i+1} from error_i).
        • Convergence is ensured if the norm of the iteration matrix (e.g., -D^-1 * R for Jacobi) is less than 1.
        • Such convergence is referred to as linear convergence. The error is decreased by a constant factor in every iteration.

        Diagonally Dominant Matrices

        • Jacobi iteration convergence is ensured for diagonally dominant matrices.
        • A matrix is diagonally dominant if the absolute value of every diagonal element is larger than the sum of the absolute values of the off-diagonal elements in the same column or row.
        • Diagonally dominant matrices occur frequently in physical problems.

        Gauss-Seidel Method

        • Yet another iterative method.
        • Dividing matrix A into a lower triangular component (L) with the diagonal elements, and an upper triangular component (U) without (A = L + U).
        • Reducing Ax = b to an iterative map: x_{i+1} = L^-1 * (-U * x_i + b).
        • Solving for x_{i+1} involves inverting L and performing back substitution, which is an order n^2 operation. This is still faster than O(n^3) elimination.
        • Convergence for Gauss-Seidel is guaranteed for diagonally dominant matrices or symmetric positive definite matrices (all eigenvalues > 0).

        Comparison and Application of Iterative Methods

        • Illustrative example comparing Jacobi and Gauss-Seidel on a small system.
        • Both are linearly convergent.
        • Gauss-Seidel converged more rapidly than Jacobi in the example.
        • Iterative methods offer finite precision results, and a tolerance needs to be specified.
        • Iterative techniques are employed to solve big linear systems where exact algorithms are too time-consuming, such as in hydrodynamics problems (e.g., flows at low Reynolds numbers).

          Advanced Methods

          • Reference to more sophisticated methods like PCG (Preconditioned Conjugate Gradient).

          Computational Crossover Point and Constraints

          • In small matrix sizes, exact algorithms (such as elimination) could be quicker due to constant prefactors, even with greater O(n3) complexity.
          • The crossover point at which iterative techniques become quicker occurs at larger values of n (such as n = 500 or 1200) in today's computers.
          • Iterative techniques may find application in environments with limited computation or memory, like on embedded platforms. They may be slower but possible when direct methods are not.

          Successive Over-Relaxation (SOR)

          • A method to encourage convergence of iterated maps, even if they do not converge well by themselves.
          • Applicable to both linear and nonlinear iterative maps.
          • If an iterative map is xi+1 = f(xi), the SOR modified map is xi+1 = (1 - ω) * xi + ω * f(xi), where ω (omega) is the relaxation parameter.
          • SOR does not alter the converged solution but influences the rate of convergence.
          • By varying ω, generally between 0 and 1, convergence can usually be accelerated or enhanced.

          SOR with Jacobi and Gauss-Seidel

          • SOR may be used with the Jacobi and Gauss-Seidel methods.
          • For Jacobi, the mapping becomes xi+1 = (1 - ω) * xi + ω * [D-1 * (-R * xi + b)].
          • A sufficiently small value of ω can cause the effective iteration matrix to be diagonally dominant, ensuring convergence.
          • The relaxation parameter ω amplifies eigenvalues of some matrices in the iteration, which aids in encouraging convergence.
          • Applying SOR can ensure convergence, although it may be slow.

          Quiz Information

          • Announcement of scheduled time and place for future quizzes.
          • Quizzes shall be in the evening, between 7 pm and 9 pm, in the gymnasium.
          • Dates are laid down on the syllabus.
          • Scheduling conflicts with other exams are acknowledged.