In part II of this series it has been discussed and demonstrated -with numbers-, that features fit best a variable when they are assumed to be in a curved space rather than the traditional flat one. This perspective was elaborated in part IV besides an argument to support a positive curvature as the best fit. However, as other approaches have been combined with the current standpoint, this time it will be used eigen vectors as in the Principal Component Analysis method (PCA).
These will be used in the markov chains predicted for the first five states -as there are five classes-. The reason why the latest is chosen is due to states' probabilities changing more in the first periods than in the ones close to the steady state; therefore, there is more information about the significance of each class in the state vector. In fact, this concept is also shared by the eigenvectors which will be used in this new approach, and are defined as a description of the linear transformation intrinsic properties.
Consequently, as throughout this series has been said, one of the reason stochastic processes are such -random- could be the fact that we are assuming everytime a flat space although other fields -as physics- have mathematically proved that we live in a hyperdimensional and curved reality. Thus, as the output Markov Chains for each instance is taken as a matrix:
They are modified to match the 5x5 shape required to calculate the eigenvectors, in case the number of rows is lower than five, more steady states are added and in the opposite case, only the first five states are taken. Hence, using the linear algebra library of Numpy, the resulting matrix of eigenvectors is as follows:
As the reader can see, some eigenvectors have a real and imaginary part, why is it? An eigenvector is a non-zero vector such that when going under linear transformation A does not change and is only multiplied by an scalar λ -lambda- called the eigenvalue. Then, the equation below is true:
Expressed as a polynomial, the above equation can be written as:
p( λ ) = det( λI - A )
Which is called the characteristic polynomial of the matrix A. Pay attention to the fact that this time was written λI-A instead of A-λI, it is deducted from v being non-zero as mentioned before, thus, the former and the latter are the same. Consequently, as the five degree polynomial would be too long to be analyzed here, the quadratic form is shown:
λ**2 − tr( A ) * λ + det( A ) = 0
It means that if the discriminant of the polynomial:
tr( A )**2 − 4 * det( A )
Is negative, the eigenvectors will have real and complex components. Therefore, nothing wrong with the calculations so far as the latter can be perfectly expected. Now, the current series proposes four types of distances; however, this time the eigenvectors are subjected to transformations to find out which class has the shortest distance to the hyperplane when positive "+k" or "-k" are selected.
Hence, for this section Non-Euclidean Metrics must be used as the ones we know don't behave as we used to know:
Therefore, based on the theory for a 3D surface of a 4D sphere:
R**2 = x**2 + y**2 + z**2 + w**2
Where:
R**2 = r**2 + w**2 ∧ r**2 = x**2 + y**2 + z**2
And:
Sin( X ) = Sin( D / R ) ∧ D = R * ArcSin ( r / R )
"D" is known as the radial distance and is what will be used as a proxy for the distance to the hyperplane as this is the arc formed by the projection of the eigenvector on the curved space. Similarly, as "r" is unknown for a 5D sphere - five classes in the current problem- the module r from the polar form r∠θ will be used as an estimate of "r".
The visual representation of how the flat space is curved as a function of X can be seen below:
An example of the "D" matrix is as follows:
Notice that both "D" matrices are very similar and it is true for all other ETFs, what does it mean? It does not matter the ETF, the probability of each class will always end up at the same radial distance from the curved surface; thus, at every state it can be known in advance what the selected class will be for a determined period.
The graph below helps to identify how the eigenvector components -classes- have been transformed to a surface with positive curvature:
Notice that vertical and horizontal axes are not both horizontal, why? To access the transformation graphs for all ETFs click here. On the other hand, now that radial distances are known, they are stacked into one single array for running the machine learning model:
An the classification report and confussion matrix are:
What the results suggest is that a positive curvature may not be the best fit for the data, let's compare with what flat spaces recorded:
As can be seen, a positive curvature assumption outperforms both Euclidean and Manhattan distance approaches. The results with negative curvature can be explored by using the source code. Download it fully here:
Comments