A sequential monte-carlo formulation for contextually supervised load disaggregation

This is a particle filtering approach to estimating HVAC consumption, but can be extended to any load estimation depending on what context information is available. In the case of HVAC, the context information is outside temperature. For appliances like lighting, the context information could be outside light intensity, for active appliances, it could be motion, for periodic appliances (eg. dish washer/stove) it could be time of day.

Data preprocessing:
I worked with Pecan street data (10 houses, 1 week 1 min resolution: 1 week 15 min resolution with appliance level ground truth). Find out the lag between temperature time series and HVAC power by cross-correlating total power and temperature. Smoothen the data (sliding window of 1 sample with a width of 10 samples was used). Add the furnace and air compressor-condensor data to find HVAC ground truth.

The problem is modeled as a particle filtering problem, where the hidden state in each time series (variable x) is the HVAC power consumption. The observed variable is the total power consumption (variable z). The action that is taken at each time point that leads to consequent state is represented by  θ. In this case, the temperature acts as the action variable.

As is clear from figure above, we need to know the transfer functions between and x. I assumed the relationship was linear, and used a primitive model based on assumptions that HVAC consumption ~50% of total power in texas during summer time. So,
z = 2x + N (μ1, σ1). This assumption is allowed to be crude, as when the model compares the estimates of z with observed z to correct for estimates of x.

Now we need to model the relationship between and θ. It is also important to model θ as the action parameter that dispalces x(t-1) to x(t). For this, I used the assumption of x(t)=z(t) / 2. Then I fit a plane for s.t. f ( x(t) )= a.θ + b x(t-1) + c based on observed data. The R square values (after robust regression were >0.98 for all, which gives confidence in the linear model.

Other parameters to consider were how many points to sample, I chose 1000, but it didn’t seem to matter much. I chose the noise distribution for both transfer functions to be gaussian. Their co-variances mattered a lot, and could probably be learned better with cross-validation on training data.

The following are the results on house 1. In this case, I assumed HVAC consumption was 40% of total (just to see how far I could push it, in actuality it is 70% for this data). The error was 2.52%. The RMS error at each time point was 13% (which I dont think is relevant). When I compare it to results from the transfer function directly (dividing each total power time stamp by 2.5 and calculating the sum), the error is 37.6%. This ensures that the results aren’t biased by the choice of transfer function, and that learning is taking place  based on observations.

Next, I tried the model on all houses. The results are pretty good as shown in the following two tables. I chose the parameter values (variance for noise models: x_N and x_R) loosely from how it performed for 1 day of house 1) and kept it constant throughout. The aligning window was based on visual alignment between total power and temperature.The HVAC factor accuracy refers to accuracy of total HVAC consumption for the week if a ballpark figure of 60% of total load was considered to be HVAC (based on Texas numbers). The regression accuracy refers to the accuracy if only the regression model (without particle filtering) was considered. The fact that particle filtering accuracies are better than both these “prior” assumptions on the data, means that the system is learning from the observable and correcting itself.

Parameters Preprocessing Accuracy
Frequency House x_N x_R Aligning Window Smoothing Window Particle Filter HVAC Factor Regression
15m 1 0.55 0.05 14 10 3.90% 6.52% 8.19%
2 0.55 0.05 22 10 2.58% 17.95% 46.65%
3 0.55 0.05 16 10 2.99% 10.33% 57.23%
4 0.55 0.05 5 10 0.29% 0.69% 63.40%
5 0.55 0.05 18 10 0.49% 5.93% 70.82%
6 0.55 0.05 7 10 2.75% 5.36% 10.07%
7 0.55 0.05 11 10 0.59% 3.92% 6.91%
8 0.55 0.05 13 10 0.27% 12.30% 50.10%
mean 1.73% 7.88% 39.17%

Pecan street sample data also has another week (in september)’s data for the same houses at a different frequency (1m). I resampled that at 15 m and redid the analysis (the resampling was done to reduce computation time, simulation methods can take a lot of time)

Parameters Preprocessing Accuracy
Frequency House x_N x_R Aligning Window Smoothing Window Particle Filter HVAC Factor Regression
1m (resampled as 15m) 1 0.55 0.05 17 10 9.11% 13.33% 31.16%
2 0.55 0.05 16 10 13.67% 19.01% 31.44%
3 0.55 0.05 15 10 3.58% 10.67% 36.96%
4 0.55 0.05 18 10 2.87% 9.80% 52.79%
5 0.55 0.05 4 10 7.40% 8.80% 17.20%
6 0.55 0.05 14 10 1.64% 11.13% 43.59%
7 0.55 0.05 11 10 14.74% 16.85% 4.49%
mean 7.57% 12.80% 31.09%

The only thing that was changed was the aligning window as resampling changes that. The aligning window is a variable that encompases the lag between temperature change and HVAC operation. It is also a factor of the thermal integrity of the house.

One of the things that bothers me is that the houses in Pecan Street have very strong HVAC component. On average 60-70% of total consumption is HVAC-more specifically cooling (which is probably understandable given it is Texas). I want to try this on a more generic dataset as well. For that I am working on SMART dataset house 1 right now. The data processing (aligning everything according to time stamps, getting temperature info) has been taking some time now.

The temperature seems to fluctuate from 29 F to 98 F (for the three months). But ground truth is only available for Furnace HRV. The question I have is does furnace HRV include all heating/cooling related consumptions ?

6 thoughts on “A sequential monte-carlo formulation for contextually supervised load disaggregation

  1. addendum ideas (after thoughts/notes to self):
    1. Put R^2 values as check metric. [spin it as a real time (DR) and aggregate (NILM) estimator]
    2. Convert normalization in regression.
    3. Instead of looking for hvac, look for factor at each time point.
    4. do it for all of pecan set. (maybe)

  2. Hi blogger, i must say you have very interesting posts here.
    Your website can go viral. You need initial traffic boost only.
    How to get it? Search for: Mertiso’s tips go viral

Leave a Reply to Anonymous Cancel reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>