Mean Section

The specific factors or random errors all have mean zero: \(E(\epsilon_i) = 0\); i = 1, 2, ... , p

The common factors, the f's, also have mean zero: \(E(f_i) = 0\); i = 1, 2, ... , m
A consequence of these assumptions is that the mean response of the ith trait is \(\mu_i\). That is,
\(E(X_i) = \mu_i\)
Variance Section

The common factors have variance one: \(\text{var}(f_i) = 1\); i = 1, 2, ... , m
i is \(\psi_i\): \(\text{var}(\epsilon_i) = \psi_i\) ; i = 1, 2, ... , p Here, \(\psi_i\) is called the specific variance.
Correlation Section

The common factors are uncorrelated with one another: \(\text{cov}(f_i, f_j) = 0\) for i ≠ j

The specific factors are uncorrelated with one another: \(\text{cov}(\epsilon_i, \epsilon_j) = 0\) for i ≠ j

The specific factors are uncorrelated with the common factors: \(\text{cov}(\epsilon_i, f_j) = 0\); i = 1, 2, ... , p; j = 1, 2, ... , m
These assumptions are necessary to estimate the parameters uniquely. An infinite number of equally wellfitting models with different parameter values may be obtained unless these assumptions are made.
Under this model the variance for the ith observed variable is equal to the sum of the squared loadings for that variable and the specific variance:
The variance of trait i is: \(\sigma^2_i = \text{var}(X_i) = \sum_{j=1}^{m}l^2_{ij}+\psi_i\)
This derivation is based on the previous assumptions. \(\sum_{j=1}^{m}l^2_{ij}\) is called the Communality for variable i. Later on, we will see how this is a measure of how well the model performs for that particular variable. The larger the commonality, the better the model performance for the ith variable.
The covariance between pairs of traits i and j is: \(\sigma_{ij}= \text{cov}(X_i, X_j) = \sum_{k=1}^{m}l_{ik}l_{jk}\)
The covariance between trait i and factor j is: \(\text{cov}(X_i, f_j) = l_{ij}\)
In matrix notation, our model for the variancecovariance matrix is expressed as shown below:
\(\Sigma = \mathbf{LL'} + \boldsymbol{\Psi}\)
This is the matrix of factor loadings times its transpose, plus a diagonal matrix containing the specific variances.
Here \(\boldsymbol{\Psi}\) equals:
\(\boldsymbol{\Psi} = \left(\begin{array}{cccc}\psi_1 & 0 & \dots & 0 \\ 0 & \psi_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \psi_p \end{array}\right)\)
A parsimonious (simplified) model for the variancecovariance matrix is obtained and used for estimation.
Notes Section
 The model assumes that the data is a linear function of the common factors. However, because the common factors are not observable, we cannot check for linearity.
 The variancecovariance matrix is a symmetric matrix, that is the variance between variables i and j is the same as the variance between j and i. For this model:
\(\Sigma = \mathbf{LL'} + \boldsymbol{\Psi}\)
The variancecovariance matrix is going to have p(p +1)/2 unique elements of \(\Sigma\) approximated by:
 mp factor loadings in the matrix \(\mathbf{L}\), and
 p specific variances
This means that there are mp plus p parameters in the variancecovariance matrix. Ideally, mp + p is substantially smaller than p(p +1)/2. However, if mp is too small, the mp + p parameters may not be adequate to describe \(\Sigma\). There may always be the case that this is not the right model and you cannot reduce the data to a linear combination of factors.
 If we have more than one variable in our analysis, that is if p > 1, the model is inherently ambiguous. To explain that, let \(\mathbf{T}\) be any m x m orthogonal matrix. A matrix is orthogonal if its inverse is equal to the transpose of the original matrix.
\(\mathbf{T'T = TT' = I} \)
We can write our factor model in matrix notation:
\(\textbf{X} = \boldsymbol{\mu} + \textbf{Lf}+ \boldsymbol{\epsilon} = \boldsymbol{\mu} + \mathbf{LTT'f}+ \boldsymbol{\epsilon} = \boldsymbol{\mu} + \mathbf{L^*f^*}+\boldsymbol{\epsilon}\)
Note that This does not change the calculation because the identity matrix times any matrix is the original matrix. This results in an alternative factor model, where the relationship between the new factor loadings and the original factor loadings is:
\(\mathbf{L^*} = \textbf{LT}\)
and the relationship between the new common factors and the original common factors is:
\(\mathbf{f^*} = \textbf{T'f}\)
This gives a model that fits equally well. Moreover, because there is an infinite number of orthogonal matrices, then there is an infinite number of alternative models. This model, as it turns out, satisfies all of the assumptions discussed earlier.
Note...
\(E(\mathbf{f^*}) = E(\textbf{T'f}) = \textbf{T'}E(\textbf{f}) = \mathbf{T'0} =\mathbf{0}\),
\(\text{var}(\mathbf{f^*}) = \text{var}(\mathbf{T'f}) = \mathbf{T'}\text{var}(\mathbf{f})\mathbf{T} = \mathbf{T'IT} = \mathbf{T'T} = \mathbf{I}\)
and
\(\text{cov}(\mathbf{f^*, \boldsymbol{\epsilon}}) = \text{cov}(\mathbf{T'f, \boldsymbol{\epsilon}}) = \mathbf{T'}\text{cov}(\mathbf{f, \boldsymbol{\epsilon}}) = \mathbf{T'0} = \mathbf{0}\)
So f* satisfies all of the assumptions, and hence f* is an equally valid collection of common factors. There is a certain apparent ambiguity to these models. This ambiguity is later used to justify a factor rotation to obtain a more parsimonious description of the data.