ruminations on Component Analysis

Most of what I have done in the past year has centered around projections. The crazy part about all of the stuff I have tried doing so far is that I have been doing it without exactly knowing what I was doing. I never had any formal training on PCA, LDA or kernel methods (unless youtube/ wikipedia qualifies as formal). So, now that I am taking MLSP and Pattern Recognition, there are lots of eureka moments that I go through each class. Still, some of the ‘world experts’ that teach our classes (Fernando de la torre, for instance) are at such a high level of understanding that it is hard to keep up with their perspectives on the matter. Fernando, for instance has come up with a master equation that explains PCA, LDA, Kernel PCA, Spectral Clustering and even K-Means as variations of the same theme. I didn’t quite get what the equation means but the general idea seemed to be minimization of reconstruction error, with certain tweaking parameters that define each projection.

The problem with this high level treatment is that we just hear a couple of sentences about each of these topics in class and that is it. The task of picking up the necessary tool and implementing it is left to the student. Spectral clustering and Independent Component Analyses have struck my fancy out of the things that have been mentioned in Fernando’s two classes. And obviously, kernel methods and SVM. Robust PCA was a very interesting idea too.  There is so much to learn, it is crazy!

For the next few days I will try to read up on particular topics each day and blog about it. I read up a bit on ICA today and implemented a few things. I will write about it tomorrow when the hour is more reasonable, but I was surprised at how much better than PCA it was. Actually, the pre-processing step for ICA does something similar (if not exactly) to PCA. With this implementation, I also realized the benefits of having a standard dataset. Since I have the dataset of 15 appliances that I have already done PCA, LDA on, I can immediately know how ICA performs on it- and the results are encouraging. I am thinking of adding it to the paper that  we plan on submitting for the AEI journal.

Some quick thoughts on projections

Suppose S(t) is a musical signal- piece of an instrument -say violin. Suppose we know that  this signal is composed of 5 different notes of violin-call them n1,n2….n5. So how do you find which notes were being played in the signal at any given time ? [Assume n1 to n5 span S(t) completely]

This, roughly, was the first question in my MLSP class. The way this is solved is by finding the spectrum for all of the notes using STFT. Then STFT is calculated at each time point for the aggregate signal (aka a spectrogram). Then the STFT at each time stamp is projected on to the STFT of the notes. If the magnitude of the projections are above a certain (empirical) threshold, then the note is played at that time stamp, otherwise it is not. The reason for empirical threshold arises from the fact that the notes aren’t orthogonal to one another. If the notes were orthogonal then the notes that don’t contribute to the signal at that time would have a zero magnitude. I am not completely sure, but I think if they are orthonormal, the projections on to notes that contribute would have a value of 1. [Mario ?/ Emre?]

Point is, Energy disaggregation is the same problem. In your ideal case the aggregate signal would be projected on to all of  the appliance signatures, and the ones that are OFF would have a magnitude of zero. Problem is, there is no reason the signatures would be orthonormal to each other. And once you try to create an orthonormal basis (say through PCA or LDA) you lose the simple contributes/doesn’t contribute classification.

Maybe- once we calculate the orthonormal bases for appliance signatures, we could project the appliance signatures into these bases as well. This way you know which of the signatures contribute how much to the basis components [say the first Principal Component]. After that we can project the aggregate signal into the orthonormal basis and know exactly what bases contribute to the signal.

Then you go and look at the projection of the signatures into these bases and see which signatures contribute to these bases  [There has to be some statistical way to do this]. Besides, it sounds reasonable that the new set of orthonormal basis (orthonormal signatures) would still span the same space as the one spanned by the signatures.

Either I am getting a better understanding of projections, or I am ruining my understanding by over thinking things in terms of NILM.