Creating power traces for non-intrusive load monitoring from GT events in BLUED

Although BLUED gives event-level ground truth (GT) for all appliances, it is not clear how to get power level ground truth based on that. Since the whole goal of NILM is to estimate power for different appliances, it would be useful if there was an estimate of appliance level consumptions in BLUED. The closest we can get is by looking at the GT event labels and going back to the aggregate power to see the corresponding change, and attributing that change to that label.

I ran some basic analysis to create power traces for appliances. Following was how the algorithm worked.
1. Look for different kinds of GT labels.

Mac IDs for Plug level GT: [1,2,3,8,11,12,18,20,23,27,28,29,31,32,34,35,40]
Mac IDs for GT from Environmental sensors : [47,48,49,50,51,52,53,55,56,57,58,59]
Mac IDs for GT from Circuit level monitoring: [4,7,9,10,11]

2. Every time there is a GT event associated with a mac address, extract the power change that happened in the main power level  by looking at the difference between a point 30 samples (0.5 seconds) after the event happened and 5 samples before it happened.

3. From these power changes, assume the power change was constant after each positive change, and zero after each negative change.

To test how well this works, I tested it with the GT labels (from Plug meters) we had from BLUED. The energy discrepancy is in part due to calibration errors for the Plug meters (as they were not individually calibrated- and that was because their goal was to provide event-level GT anyway). Part of it is also due to different sampling rates used in the Plugs and the power calculations in BLUED. (The one in the Plugs has one sample every 0.66 seconds, while the power I calculated from aggregate data was 1 Hz). I tried to account for that by resampling the plug data at 2/3 of its frequency. But, that apart. this discrepancy also points to the limitations of estimating power based on event based methods. Even in cases like these where complete event level GT is available, the estimated energy for the week can be off by as much as 270%. Table 1 details the results from comparison of energy level GT for all plug meters.

Mac ID Number of Events Energy calculated from aggregate power (kwhr) Energy calculated from Plug (kwhr) Error %
1 26 0.86 0.23 273.17
2 25 0.93 0.49 88.25
3 24 0.06 0.10 -43.01
8 16 0.13 0.10 32.81
11 616 9.51 6.95 36.87
12 8 3.62 4.88 -25.89
18 45 1.70 1.84 -7.56
20 14 1.39 1.29 8.10
23 34 0.85 2.43 -64.95
27 20 0.07 0.06 7.07
28 77 0.73 0.94 -22.92
29 54 5.36 6.36 -15.71
31 150 0.06 0.16 -63.78
32 8 0.29 0.27 9.09
34 40 0.14 0.14 -4.19
35 2 0.00 0.08 -99.12
40 150 0.44 0.63 -30.55

To test if some of the discrepancy that is observed is due to calibration error for the Plugs, I looked at the mean power consumption of the appliances (mean of all the step changes observed) in both the aggregate case, and the GT case. Following were the results:

Mac ID Estimated mean consumption from aggregate(watt) Estimated mean consumption from plug(watt) Approx. calibration error from Plug (%)
1 29.09 11.75 59.59
2 30.87 23.90 22.57
3 418.33 354.73 15.20
8 778.59 874.25 -12.29
11 157.21 208.64 -32.72
12 44.35 43.11 2.78
18 42.08 57.68 -37.08
20 28.75 37.73 -31.22
23 19.82 80.69 -307.03
27 1240.11 1163.13 6.21
28 30.95 35.08 -13.35
29 170.09 189.19 -11.23
31 1003.14 799.96 20.25
32 1664.38 1271.66 23.60
34 1418.46 1429.12 -0.75
35 64.99 50.55 22.21
40 30.48 36.49 -19.72

So, here I made the assumption that the mean value calculated from the aggregate power was closer to the rated power consumption of the device (which is what was done in the table presented in the original BLUED paper as well). After that I re-calibrated the energy consumption info obtained from Plugs (the error values obtained above in table 2). The following table lists the re-adjusted error in Energy calculations from aggregate power data (with perfect GT) when compared to Plug level energy GT.

Mac ID Energy after calibration adjustment (kwhr) Error after calibration adjustment (%)
1 0.37 -133.82
2 0.60 -53.58
3 0.12 50.53
8 0.09 -51.42
11 4.68 -103.42
12 5.02 27.89
18 1.16 -46.92
20 0.89 -57.16
23 -5.03 116.93
27 0.07 -0.81
28 0.82 11.05
29 5.65 5.04
31 0.19 69.88
32 0.33 11.73
34 0.14 3.46
35 0.10 99.28
40 0.51 13.49

Following are a few plots of what the estimated power trace looked like when compared to plug level power trace. The red line denotes the reconstructed power trace based on Event level GT and the Blue line is the GT based on plug level data.

What is clear from these plots is that most of the error can be attributed to the inability of any event-based algorithm to go back to the aggregate power when an event happens and estimate appropriately the power delta that caused the event. And since such information is only available during events, a state based method is probably more reliable.

Some of the power traces created based on Env Sensor ground truths are shown below.

I am sharing the power traces for all the devices with their mac_ids so that they can used as a reference for anyone who wants to do power estimation using BLUED. I am also sharing the code for extracting such info from a BLUED power based dataset. [Actually sharing is not possible through our website just yet, but should be in a few days]

Some quick thoughts on projections

Suppose S(t) is a musical signal- piece of an instrument -say violin. Suppose we know that  this signal is composed of 5 different notes of violin-call them n1,n2….n5. So how do you find which notes were being played in the signal at any given time ? [Assume n1 to n5 span S(t) completely]

This, roughly, was the first question in my MLSP class. The way this is solved is by finding the spectrum for all of the notes using STFT. Then STFT is calculated at each time point for the aggregate signal (aka a spectrogram). Then the STFT at each time stamp is projected on to the STFT of the notes. If the magnitude of the projections are above a certain (empirical) threshold, then the note is played at that time stamp, otherwise it is not. The reason for empirical threshold arises from the fact that the notes aren’t orthogonal to one another. If the notes were orthogonal then the notes that don’t contribute to the signal at that time would have a zero magnitude. I am not completely sure, but I think if they are orthonormal, the projections on to notes that contribute would have a value of 1. [Mario ?/ Emre?]

Point is, Energy disaggregation is the same problem. In your ideal case the aggregate signal would be projected on to all of  the appliance signatures, and the ones that are OFF would have a magnitude of zero. Problem is, there is no reason the signatures would be orthonormal to each other. And once you try to create an orthonormal basis (say through PCA or LDA) you lose the simple contributes/doesn’t contribute classification.

Maybe- once we calculate the orthonormal bases for appliance signatures, we could project the appliance signatures into these bases as well. This way you know which of the signatures contribute how much to the basis components [say the first Principal Component]. After that we can project the aggregate signal into the orthonormal basis and know exactly what bases contribute to the signal.

Then you go and look at the projection of the signatures into these bases and see which signatures contribute to these bases  [There has to be some statistical way to do this]. Besides, it sounds reasonable that the new set of orthonormal basis (orthonormal signatures) would still span the same space as the one spanned by the signatures.

Either I am getting a better understanding of projections, or I am ruining my understanding by over thinking things in terms of NILM.


Prelude : Impostor Syndrome

While attending Zico’s seminar on NILM (and data driven demand response) yesterday, Prof. Maria Ilic brought up an interesting bit of history. Apparently when she was at MIT around twenty years ago, Leeb, Norford etc. were working on laying the foundations for what was to become NILM. Leeb, she said, was of the opinion that energy disaggregation is a NP hard problem that cannot be solved completely. Basing off of that, she asked Zico why he believes that this problem is worth trying to solve. Zico answered that using novel inference methods on integer programming formulation of these problems has been possible because of the increased computational power. Apparently this has been successfully implemented in other blind source separation problems and he believes that energy disaggregation is likely to follow the same path.

 Given the amount of time, energy and resources that has been spent into this problem, I believe it is time to question if the benefits of energy disaggregation is worth the effort. Everyone cites the same study that promises 20% savings but when I try to picture myself as the typical consumer, I am not convinced about the effectiveness of such feedback.

Lets assume a perfect scenario: I am told 20% of my monthly bill is contributed towards by my fridge. I am also told that this is 30 % above the national average. Lets take it even a step further. Lets say I am also told that if I want to save 15% I need to buy a energy star certified fridge that will have a payback period of 5 years.

Will this change my behavior ? No ! Will I buy a new fridge ? NO !

The only time this MIGHT change my behavior is when it turns out that I am an extra careless consumer, and things like lightbulbs or TVs are taking away most of my energy so that I can turn them off when I am not using them. But then again, it is highly unlikely that these appliances will be eating away most of my bill. An average consumer doesn’t need a disaggregated bill to remind them to turn their heating/cooling off while leaving the house or to turn lights off during the night. Around eight appliances will consume 80% of the total electric bill in an average household according to Derrick- and in that case the major consumers like Washer, Dryer, AC, Fridge, Dishwasher, Cooktop etc. will take away most of the bill. The micro level disaggregation that we aim for doesn’t even matter from a practical energy savings perspective. Sometimes, I get the sense that this is persisted upon in academia solely because of the challenge it presents and not the possible implications. I sort of get why the industry would be interested in this. It is a very marketable idea, and a device that promises to do disaggregation with minimal installation/training (regardless of the accuracy) will be easy to pitch to the consumer base. Plus the potential buyer’s market and the profit margin is huge (because in theory at least this is solely a computational problem).

So, motivation from a purely energy savings standpoint fails to excite me anymore. Actually, I am not sure if it ever did. I am also skeptical about the demand response part of this because that either warrants hardware installation which would eliminate the need for device classification (as labeling can be done while the installation is done), or a smart appliance that has inbuilt potential for automated control in which case too, classification is unnecessary.

The true motivation for me lies in the potential of this method to convey information about a particular device non-intrusively. And, in this case, we are talking non-intrusive as in without entering the device. I’ll elaborate more on a few thoughts that I have on this matter in a future post.