On the avoidance of crossing of singular values in the evolving factor analysis

Evolving factor analysis (EFA) investigates the evolution of the singular values of matrices formed by a series of measured spectra, typically, resulting from the spectral observation of an ongoing chemical process. In the original EFA, the logarithms of the singular values are plotted for submatrices that include an increasing number of spectra. A typical observation in these plots is that pairs of trajectories of the singular values are on a collision course, but finally, the curves seem to repel each other and then run in different directions. For parameter‐dependent square matrices, such a behaviour is known for the eigenvalues under the keyword of an avoidance of crossing. Here, we adjust the explanation of this avoidance of crossing to the curves of singular values of EFA. Further, a condition is studied that breaks this avoidance of crossing. We demonstrate that the understanding of this noncrossing allows us to design model data sets with a predictable crossing behaviour.


INTRODUCTION
The rank of a spectral data matrix D can be estimated by the number of above-noise-level eigenvalues of the symmetric matrix D T D. In many chemical applications, the rows of the data matrix are formed by a series of spectra taken as a function of time or generally of process progress. Therein, D is a k-by-n matrix, where k is the number of spectra and n the number of wavelengths at which the spectra are measured. In such instances, the rank is often seen as the chemical rank or the number of linearly independent species that coexist in the process under investigation. In evolving factor analysis (EFA), the development of the rank can be further analysed by determination of the rank of submatrices of D. EFA was first introduced by Gampp et al 1 and was improved by Maeder et al. [2][3][4] It has found a large number of applications in various fields of analytical chemistry as a model-free method for fast information extraction. Typically, the data are recorded from ongoing chemical reactions, chromatographic processes, spectrophotometric titrations, or processes that are subject to change under varying parameters as temperature, pH values, or time. 5 In the original EFA, the submatrices D[ ] are formed by a growing number of rows of D, starting with the first, then the first two, then the first three rows, etc.  In order to introduce the topic of the present paper, let us consider the following three-component chromatographic data set. On the wavelength interval ∈ [400, 600], we take the three spectral profiles s 1 ( ) = g( , 450, 30), s 2 ( ) = 3g( , 500, 30), s 3 ( ) = 2g( , 550, 30) (2) with the Gaussian g( , a, b) = exp(−( − a) 2 ∕(b∕2) 2 ). The elution (concentration) profiles on the time interval t ∈ [0, 100] are supposed to be These profiles are shown in Figure 1. The functions are discretised by using = 0.25 and t = 0.1. This yields a matrix S ∈ R 801×3 of spectral profiles and a matrix C ∈ R 1001×3 of elution profiles. Thus, the absorbance data matrix D = CS T is a 1001 × 801 matrix. We add about 0.1% (of the maximal absorption) normal distributed noise with the mean 0 and the variance 1.
If no noise is added and if ≥ 3, the rank of all the D[ ] equals 3. If 0.1% of noise is added, then the fourth and all following eigenvalues are close to zero. Figure 1 shows in its lower row the four largest singular values of D[ ] for = 10t and t = 0.1, 0.2, … , 100. The curves of the three largest singular values show the typical behaviour, they seem to follow the concentration profiles of the three species and appear to be on collision course with another curve, but finally the curves seem to repel each other. This repulsion is clearly shown by the sectional enlargement in the lower right plot of Figure 1. This behaviour of the curves of singular values is typical of most of the EFA curves and can be found in many of such plots in the referenced paper on EFA. A mathematical explanation for this noncrossing of the singular value curves is given in the next section.  , B =

FIGURE 3
Case of nearly orthogonal elution profiles. Then a crossing of the singular values seems to be taking place at a first rough glance. The singular value curves are drawn in black in order to avoid a preliminary interpretation concerning potential noncrossing

AVOIDANCE OF CROSSING
First, the avoidance of crossing of the eigenvalues of a parameter-dependent matrix is illustrated by for symmetric matrices A and B and with a real parameter . Symmetry of the matrices is essential for the following analysis; later, we connect the singular values of D[ ] with the square roots of the eigenvalues of the symmetric matrix Figure 2 shows the eigenvalues of C( ) for a more or less random choice of symmetric 2 × 2 matrices A and B against ∈ [0, 1.5]. Starting at = 0, the two eigenvalues 1 ( ) and 2 ( ) are getting closer for increasing but then show the typical behaviour of a mutual repulsion. Around = 0.55, the difference of the eigenvalues is the smallest, and after this, the distance monotonously increases. This phenomenon is known under the keywords of an avoidance of crossing or non-crossing. It was observed in quantum mechanics for parameter-dependent Hamiltonian operators and was investigated by Wigner and von Neumann. 6 The eigenvalue noncrossing of symmetric matrices is closely related to the question how likely a symmetric matrix with random matrix elements has multiple eigenvalues. To our knowledge, the best explanation in a linear algebra textbook was given by Lax. 7 , section 9.5· Next, we recapitulate the argumentation by Lax that is based on a study of the likelihood that an arbitrary symmetric matrix has a degenerate eigenvalue, namely, an eigenvalue of the multiplicity 2. First, we state that a symmetric n × n matrix has n(n + 1)∕2 degrees of freedom (dof). These are the number of its matrix elements that can independently be assigned by real numbers; its subdiagonal elements determine its superdiagonal elements by symmetry. There is a second way to count these dof by considering the eigenvalue/eigenvector decomposition of the symmetric matrix. Therefore, we count the dof of the eigenvalues and the associated eigenvectors. If all eigenvalues of the matrix are simple, then we have n dof for the eigenvalues. The associated eigenvectors form an orthogonal matrix and are normalized. Hence, the first eigenvector has n − 1 dof (namely, for the components of the eigenvector and where the last component is determined by In the case of a degenerate matrix with an eigenvalue with the multiplicity 2 and all remaining eigenvalues being simple, the latter summation is to be modified as follows: First, only n − 1 eigenvalues can be chosen. If we start by counting the dof of the eigenvectors associated with the simple eigenvalues, the argumentation is as above. If we reach the second to the last and the last eigenvector that belong to the degenerate eigenvalue, then the eigenspace is completely determined and no dof remains. Thus, we get degrees of freedom. Next C( ) is considered as a curve depending on the parameter . (In order to follow this geometric interpretation, imagine the example of the parameter-dependent vector c( ) = (sin( ), cos( )) T that forms a circle in the two-dimensional space for ∈ [0, 2 ].) As it is unlikely for the curve C( ) in the N-dimensional space to hit a surface that depends on N −2 parameters, the avoidance of crossing is a typical behaviour for eigenvalues. (In order to illustrate the last argument, consider the 2 × 2 matrix (4) that has N = 3 degrees of freedom. Then, C( ) is a curve in a three-dimensional space for which it is unlikely to hit a surface with the dimension N − 2 = 1, namely, a second curve. Metaphorically speaking, two randomly moving helium atoms in an evacuated vessel will nearly never collide.

FORCING A CROSSING OF THE SINGULAR VALUES/EIGENVALUES
In some cases, the crossing of eigenvalues can be forced. 8 By the following example, we show that pairwise (approximate) orthogonal elution profiles seem to constitute an eigenvalue crossing. Therefore, we reuse the model problem from Section 1 but increase the mutual distances of the centres of the elution profiles (3). So we consider the nearly orthogonal profiles  ( , 68, 7).
The spectra are still given by (2). The numerical results are shown in Figure 3 with = 0.2 and t = 0.1. A sectional enlargement of the crossing point cannot confirm an avoidance of crossing. The curves of singular values are drawn as black lines in order not to imply a certain behaviour. However, the numerical resolution is limited as we do not have However, there is still a non-neglectable influence of the spectra. If we use the orthogonal profiles (5) but modify the spectra (2) from Section 1 in a way of a stronger overlap to then the avoidance of crossing of the singular values can be observed, see Figure 4. We conclude that an orthogonality of the spectra and additionally an orthogonality of the elution profiles result for these problems in a true singular value crossing. And in fact, the following mathematical analysis shows that the orthogonality of the spectra together with the orthogonality of the elution profiles are sufficient conditions that make an eigenvalue crossing possible. If each C and S have orthogonal columns, which are not necessarily orthonormal, then these matrices can be represented as with orthonormal matrices P ∈ R k×m and Q ∈ R n×m and diagonal scaling matrices D P and D Q . Then P and Q satisfy P T P = I m×m and Q T Q = I m×m .
By direct calculation, we get that If we denote the ith column of Q by q i and the ith column of P by p i , then the last equations can be rewritten as This means that the eigenvectors of D T D are the columns q i of Q and that the associated eigenvalues are ||c i || 2 ||s i || 2 . Similarly, the p i are the eigenvectors of DD T . Thus, the singular value of D that is associated with q i is ||c i || · ||s i ||, namely the square root of ||c i || 2 ||s i || 2 . We conclude that the ith EFA singular value i according to the matrices D[ ] by (1) of growing dimension equals This means that the singular value curve of the ith singular value is only determined by its ith concentration profile c i and its time development c i (1 ∶ ) as well by its associated spectrum s i . The other profiles c and s for ≠ i do not affect the ith singular value. This implies that these singular value curves can cross-they completely ignore the behaviour of the other singular values.
If C and S have orthogonal columns, then the singular value curves can cross. However, we do not claim the reverse statement, namely, that for non-orthogonal matrices a crossing is impossible. Assuming the existence of crossing curves in the case of non-orthogonality, the avoidance of crossing rule shows that arbitrarily small changes of such a system will nearly always change the singular values in a way so that its curves are on a noncrossing course. In the language of mathematics, the set of matrices with multiple singular values is of the measure zero in the set of all matrices. Such cases are not accessible numerically in the presence of rounding errors or for experimental data with its limited data precision.

DESIGN OF MODEL SYSTEMS WITH A PREDICTED CROSSING BEHAVIOUR OF THE SINGULAR VALUES
The analysis in Section 3, which determines the ith singular value of EFA to be given by (7), allows us to design a model system with a completely predictable behaviour of the curves of singular values. Next, we substitute the Gaussian profiles by simple triangle profiles with their compact supports. The advantage of such profiles is that the orthogonality constraint can easily be implemented. Contrastingly and in a strict mathematical sense, a pair of Gaussian profiles can never be orthogonal but can only be close to be orthogonal if the centres of the Gaussians are well separated. Next, four model problems are considered with each five chemical components. The elution profiles and the spectra of these components are modeled by triangle profiles.  Figure 5 for all these profiles. Further, Figure 5 shows the sparsity pattern of D T D, namely, the so-called spy-plot in Matlab. Its structure is that of a block diagonal matrix in accordance with the analysis in Section 3. The curves of singular values show the typical crossing behaviour-each curves crosses all other curves. Experiment II: Compared with the first experiment, we move the elution profile of the third component along the t-axis so that the orthogonality with the elution profile of the fourth component is broken. The spectra are the same as in the first experiment. Figure 6 shows  In these four experiments, we have in three cases modified the spectra and in one case we have forced the concentration profiles to overlap. There is no need to make further experiments in which these modifications are applied to the other factor. This will not provide new results. The reasoning is as follows: First, a transposition applied to the spectral data matrix D = CS T results in D T = SC T so that C and S change their places. If it is true that D and D T have the same (nonzero) singular values, then C and S have a comparable influence on the singular values. In order to show this, let D = UΣV T be a singular value decomposition of D. Then D T = VΣ T U T is a singular value decomposition of D T . Thus, the nonzero singular values of D and D T and their multiplicities are the same. A different way to express the equal influence of C and S on the singular values of D (or D T ) is the form of the singular values ||c i || · ||s i ||, see Section 3, since only the Euclidean norms of c i and s i determine in an interchangeable, symmetric way the singular values.

SUMMARY AND CONCLUSION
EFA curves are traditionally used to indicate the appearance or disappearance of new chemical species by changes in the number of above-noise-level singular values. A frequently observed behaviour of EFA plots is that the curves of singular values have an inherent repulsive nature. Even if the curves of two singular values are on a collision course, they finally seem to repel each other and run in different directions.
This paper points out that this behaviour of the curves of singular values can be explained by the mathematical property that it is unlikely for a symmetric parameter-dependent matrix to have double eigenvalues or eigenvalues with an even higher multiplicity.
Hence, the chemometrician can interpret the avoidance of crossing of the singular value curves as a typical and natural phenomenon for reaction systems with overlapping/non-separated pure component spectra. Conversely, if sometimes a crossing of singular value curves is observed, then according to Section 4, this can indicate that the chemical reaction system contains some overlap-free (or orthogonal) pure component spectra. The knowledge about overlap-free pure com-ponent spectra is welcome in a multivariate curve resolution analysis as it simplifies the pure component decomposition process. In this sense the present analysis by exploiting intricate matrix properties helps to understand the behaviour of EFA curves and can potentially support the chemometric pure component analysis of chemical reaction systems.
In a future work, we hope to combine the local rank information of EFA plots for reducing the rotational ambiguity underlying the pure component factorization problem.