Lesson 0: Matrices and Vectors

Overview: Why Matrix Algebra? Section

Univariate statistics is concerned with a random scalar variable \(Y\) .

In multivariate analysis, we are concerned with the joint analysis of multiple dependent variables. These variables can be represented using matrices and vectors. This provides simplification of notation and a format for expressing important formulas.

Example 0-1: Section

Suppose that we measure the variables \(x_1\) = height (cm), \(x_2\) = left forearm length (cm) and \(x_3\) = left foot length for participants in a study of the physical characteristics of adult humans. These three variables can be represented in the following column vector:

\[\mathbf{x}= \left(\begin{array}{l}x_1\\ x_2\\x_3 \end{array}\right)\]

The observed data for a specific individual, say the i^th individual, might also be represented in an analogous vector. Suppose that the \(i^{th}\) person in the sample has height = 175 cm, forearm length = 25.5 cm and foot length = 27 cm. In vector notation these observed data could be written as:

\[\mathbf{x_i} = \left(\begin{array}{l}x_{i1}\\x_{i2}\\x_{i3}\end{array}\right)=\left(\begin{array}{l}175\\25.5\\27.0\end{array}\right)\]

Notice the use and placement of the subscript i to represent the \(i^{th}\) individual.

Definitions of Matrix and Vector Section

Matrix: A matrix is a two-dimensional array of numbers of formulas

Vector

A vector is a matrix with either only one column or only one row

Column vector: A column vector contains only one column
Row vector: A row vector contains only one row

Dimension of a Matrix: A dimension of a matrix is expressed as the number of rows × the number of; columns matrix with 10 rows and 3 columns is said to be 10 × 3; matrix vectors written in Example 1 above are 3 × 1 matrices

Square Matrix: A square matrix numbers of rows and columns are the same; a 4 × 4 matrix is a square matrix

The Data Matrix in Multivariate Problems

Usually, the observed data are represented by a matrix in which the rows are observations and the columns are variables. This is exactly the way the data are normally prepared for statistical software such as SAS or Minitab.

The usual notation is n = the number of observed units (people, animals, companies, etc.) and p = the number of variables measured on each unit. Thus the data matrix will be an n × p matrix.

Example 0-2: Section

Suppose that we have scores for n = 6 college students who have taken the verbal and the science subtests of the College Qualification Test (CQT). We have p =2 variables: (1) the verbal score and (2) the science score for each student. The data matrix is the following 6 × 2 matrix:

\[\mathbf{X}=\left(\begin{array}{ll}41&26\\39&26\\53&21\\67&33\\61&27\\67&29\end{array}\right)\]

In the matrix just given, the first column gives the data for \(x_1\) = verbal score whereas the second column gives data for \(x_2\) = science score. Each row gives data for a student in the sample. To repeat – the rows are observations, the columns are variables.

Notation notes: Section

Note that we have used a small \(\textbf{x}\) to denote the vector of variables in Example 1 and a large \(\textbf{X}\) to represent the data matrix in Example 2. It should also be noted that, in matrix terms, the i^th row in the data matrix \(\textbf{X}\) is the transpose of the data vector

\(\mathbf{x_i}=\left(\begin{array}{l}x_{i1}\\x_{i2}\end{array}\right)\), as we defined data vectors in Example 1.