Prelude : Impostor Syndrome

While attending Zico’s seminar on NILM (and data driven demand response) yesterday, Prof. Maria Ilic brought up an interesting bit of history. Apparently when she was at MIT around twenty years ago, Leeb, Norford etc. were working on laying the foundations for what was to become NILM. Leeb, she said, was of the opinion that energy disaggregation is a NP hard problem that cannot be solved completely. Basing off of that, she asked Zico why he believes that this problem is worth trying to solve. Zico answered that using novel inference methods on integer programming formulation of these problems has been possible because of the increased computational power. Apparently this has been successfully implemented in other blind source separation problems and he believes that energy disaggregation is likely to follow the same path.

 Given the amount of time, energy and resources that has been spent into this problem, I believe it is time to question if the benefits of energy disaggregation is worth the effort. Everyone cites the same study that promises 20% savings but when I try to picture myself as the typical consumer, I am not convinced about the effectiveness of such feedback.

Lets assume a perfect scenario: I am told 20% of my monthly bill is contributed towards by my fridge. I am also told that this is 30 % above the national average. Lets take it even a step further. Lets say I am also told that if I want to save 15% I need to buy a energy star certified fridge that will have a payback period of 5 years.

Will this change my behavior ? No ! Will I buy a new fridge ? NO !

The only time this MIGHT change my behavior is when it turns out that I am an extra careless consumer, and things like lightbulbs or TVs are taking away most of my energy so that I can turn them off when I am not using them. But then again, it is highly unlikely that these appliances will be eating away most of my bill. An average consumer doesn’t need a disaggregated bill to remind them to turn their heating/cooling off while leaving the house or to turn lights off during the night. Around eight appliances will consume 80% of the total electric bill in an average household according to Derrick- and in that case the major consumers like Washer, Dryer, AC, Fridge, Dishwasher, Cooktop etc. will take away most of the bill. The micro level disaggregation that we aim for doesn’t even matter from a practical energy savings perspective. Sometimes, I get the sense that this is persisted upon in academia solely because of the challenge it presents and not the possible implications. I sort of get why the industry would be interested in this. It is a very marketable idea, and a device that promises to do disaggregation with minimal installation/training (regardless of the accuracy) will be easy to pitch to the consumer base. Plus the potential buyer’s market and the profit margin is huge (because in theory at least this is solely a computational problem).

So, motivation from a purely energy savings standpoint fails to excite me anymore. Actually, I am not sure if it ever did. I am also skeptical about the demand response part of this because that either warrants hardware installation which would eliminate the need for device classification (as labeling can be done while the installation is done), or a smart appliance that has inbuilt potential for automated control in which case too, classification is unnecessary.

The true motivation for me lies in the potential of this method to convey information about a particular device non-intrusively. And, in this case, we are talking non-intrusive as in without entering the device. I’ll elaborate more on a few thoughts that I have on this matter in a future post.

Matlab tidbits

Here’s how you’d implement a low pass butterworth filter in matlab. This has come in very handy.

[b,a] = butter(10, fNorm, 'low');
filtered_signal = filtfilt(b, a, signal);
Here’s how you’d add (gaussian white) noise:
noisy_data=awgn(data, SNR)
Matlab can be a real pain when trying to save CSV files with different data types (string + numeric: lets say). But to try things like SVN and Naive Bayes on weka, you need your labels to be nominal (not numeric). Fortunately, this guy wrote this code which works like a charm. All you have to do (after downloading that file) is :
cell2csv ( 'filename.csv', cell_array_that_you_want_to_write, ' , ');

Some quick things in R

Part of a revelation I’ve had this past couple of months relates to how different programs are good at doing different things.

For instance, R is very intuitive and fast when it comes to doing classifications and projections. Here is how you would implement a Naive Bayes and K-NN in R.

naive Bayes in R :
make sure the columns for train and test have the same heading.
> train<-read.csv("train_7dim.csv")
> test<-read.csv("test_7dim.csv")
> dft <- data.frame(cbind(train,label))
> head(dft)
> dft$X0 <- as.factor(dft$X0)
> m<-naiveBayes(X0~.,dft)
> table(predict=predict(m, test))

k-nn in R :
> knn(train, test, label, k = 1, l = 0, prob = FALSE, use.all = TRUE)