1 Introduction

Wind power plays an important role when we try to find solutions for the climate crisis.

Carbon neutrality requires a huge number of new wind power plants in Finland. According to Motiva by the end of 2019 there were 754 wind turbines in Finland, generating about 5.9 TWh electricity. That amounts to 7% of domestic consumption.

If the electricity generation pattern of wind turbines is properly understood, it is possible to dynamically vary consumption accordingly. Thus the intermittent nature of wind power generation would be manageable without additional polluting and expensive reserve power systems.

1.1 Generation Capacity

If a wind turbine’s nominal capacity is 3MW, it could generate \(3MW \times 24h = 72MWh\) in a day. However, such a turbine in Finland generates about a third of that (Tuulivoimayhdistys, 2020). Generation depends heavily on weather conditions and fluctuates according to for example the season, time of day, and location.

The 754 wind turbines in Finland have a total capacity of 2284MW. The efficiency is thus the yearly generation divided by nominal capacity

\[ \frac{5.9TWh}{2284MW \times 365 \times 24h} = 29.5\% \]

1.2 Problem Analysis

What is the trend of the generation efficiency? What kind of generation patterns should we prepare for, when more and more electricity is generated by wind turbines?

Let’s fit a linear regression to a time series of wind generation values. The regression helps to understand both the potential capacity and mean of wind power. The mean, and the trend, show us what kind of real efficiency to expect from wind generation in Finland. The trend can be used to make rough estimate of the potential: For example, are all the good locations already taken or is the trend actually rising? A rising, or even a constant, trend would imply that there’s still profitable untapped potential.

1.3 Modeling Idea

Fit a simple linear regression model to the data. Compare the simple fit with another, hierarchical model that makes use of seasonal information.

The generation efficiency varies according to weather. But instead of gathering arbitrary weather data, let’s use something more generalizable. In Finland the weather varies according to season. From Wikipedia/Suomen_ilmasto and Ilmatieteenlaitos/termiset-vuodenajat:

  • During the winter, the mean temperature is below zero.
  • In the summer, the mean temperature is above +10°C.

Most of the wind farms are in the north-western coast, so let’s use seasonal data from around Oulu region.

  • On average, summer begins Jun 1st and ends Sep 1st (3 months)
  • Winter begins Nov 1st and ends Apr 1st (5 months)
  • Spring and autumns last 2 months each.

2 Description of Data

library("ggplot2")
library("bayesplot")
library("data.table")
library("lubridate")
library("dplyr")
library("rstan")
library("bayesplot")
library("loo")
d <- fread("~/notebooks/project/fingrid_10y.csv")

We’ll use open data from the Finnish transmission system operator Fingrid. This is not a ready-made data set, so it hasn’t been used before as such. There are, of course, analysis of other wind generation data, for example (VTT, 2013), (Koivisto et al., 2014), and (Koivisto, 2015).

The data used here is downloaded from Fingrid at:

The data represents hourly wind power generation from March 2015 to November 2020.

par(mfrow=c(1, 2))
d %>% group_by(Year = year(timestamp)) %>% 
  summarise(Capacity = last(capacity), .groups = 'drop') %>%
  with(barplot(Capacity ~ Year, col = 'salmon', main = "Nominal capacity"))
d %>% group_by(Month = ceiling_date(timestamp, 'month')) %>%
  summarise(Generated = mean(generated), .groups = 'drop') %>%
  with(plot(Generated ~ Month, type = 'l', main = 'Generated electricity',
            ylim = c(0, 2000)))

During the past six years the wind power generation capacity in Finland has more than tripled.

c(first(d$capacity), last(d$capacity), last(d$capacity) / first(d$capacity))
## [1]  625.0000 2268.0000    3.6288

We’ll use the relative efficiency (generation divided by nominal capacity) to see how the generation efficiency has developed over years.

2.1 Wide Fluctations

The data shows the intermittent nature of wind. The generated power fluctuates wildly around the year. For example, here’s a two week time span showing how the generated power quickly drops from near maximum to near minimum. The horizontal line shows the average power generation efficiency.

par(mar=c(2, 2, 0, 0))
d %>% filter(timestamp %within% as.interval(weeks(2), ymd("2017-09-30"))) %>% 
  with(plot(geneff ~ timestamp, type='l', col='blue', lwd=1))
abline(h=.3, col='red', lty=3)

2.2 Weekly Summary

We’ll summarize the 50k rows of hourly observations to weekly mean and standard deviation to make modeling feasible. Summarizing smooths the data while retaining information about the extent of fluctuations. There are 298 weeks of data.

# Weekly time series
w <- mutate(d, week = isoweek(timestamp), month = month(timestamp)) %>%
  group_by(ts = floor_date(timestamp, "week")) %>%
  summarise(mug = mean(geneff), sig = sd(geneff), month = median(month), 
            week = median(week), .groups = 'drop')
nrow(w)
## [1] 298
w$ma12 <- frollmean(x=w$mug, n=12, align = "center") # 12 weeks moving average
ggw <- ggplot(data = w, aes(x=ts, y=mug)) +
  geom_point(color = "#56B4E988", size = 1) +
  labs(x = "Week", y = "Generation efficiency")
ggw + geom_line(data = w, aes(y = ma12), color = "#0072B288", na.rm = TRUE) +
  labs(title = "Weekly efficiency and 12 weeks moving average")

3 Model Implementations

A time series is typically modeled as a combination of components, such as trend, seasonality, and error. Each component represents a different effect to the series. The combination can be additive or multiplicative, depending on the components’ effect. Here we assume that the variance of generation efficiency does not increase with time but remains stable. The assumption is based on the fact that the efficiency is a bounded value between 0 and 1, and technical development should only increase the value. On the other hand, climate crisis can cause more extreme weather conditions.

An additive model is a sum \(y(t) = g(t) + s(t) + \epsilon\), where \(g(t)\) is the trend, \(s(t)\) is the seasonal component, and \(\epsilon\) is a normally distributed error term.

3.1 Linear Regression

The simplest model is a linear regression without a seasonal component. The model can be used to analyze the overall trend: is the power generation efficiency increasing or decreasing - or constant? Mathematical representation of the model is

\[\begin{aligned} y_{i} & \sim N(\mu_i, \sigma) \\ \mu & \sim \alpha + \beta \times t \\ \alpha & \sim N(0.3, 0.2) \\ \beta & \sim N(0, 0.1) \\ \sigma & \sim N(0.25, 0.5), \end{aligned}\]

where \(t\) is the time step, i.e. a week from 1 to 298. Sigma is normally distributed but constrained to positive values using Stan code notation real<lower=0> sigma;.

3.2 Hierarchical Seasonal

Add a seasonal component using a hierarchical model that groups data by month. We assume that distinct seasons in Finland affect the generation efficiency, and that months share common seasonal characteristics over years. Therefore we can model seasonal data as \(6 \times 12\) matrix, or six years of 12 months. A mathematical summary of the hierarchical part of the model is

\[\begin{aligned} y_{ij} & \sim N(\mu_j, \sigma_j) \\ \mu_j & \sim N(\theta, \tau) \\ \theta & \sim N(0.3, 0.2) \\ \sigma_j & \sim N(0.25, 0.5), (>0)\\ \tau & \sim N(0.25, 0.5), (>0) \end{aligned}\]

In addition to the hierarchical seasonal component s(t), we have a trend component similar to the previous model, g(t).

4 Prior Selection

Weakly informative priors are selected based on previous scientific knowledge about the generational efficiency.

4.1 Prior for Efficiency

Fingrid’s figures show the mean efficiency is 29.5%, while the Finnish Wind Power Association (Tuulivoimayhdistys, 2020) reports 33%. We’ll use normal with mean 0.3 and standard deviation 0.2. 50% of the distribution is between 0.17 and 0.43, which are plausible values for average efficiency.

The standard deviation for efficiency is relatively large, because in stormy weather the turbines generate almost at maximum and when it’s not windy, there no generation at all. A weakly non-informative prior for standard deviation is thus relatively wide N(0.25, 0.5), constrained to positive values.

4.2 Prior for Slope

We don’t know whether the trend is positive or negative, but want the data to tell that. We’ll use a normal distribution with zero mean and 0.1 standard deviation, although even tighter standard deviation could be used. If there is a trend, it’s not a huge leap, i.e. no more than a few percentages per year.

5 Linear Regression Model

5.1 Stan Code

comp_m <- stan_model("linear_gaussian_priors.stan")
print(comp_m)
## S4 class stanmodel 'linear_gaussian_priors' coded as follows:
## data {
##   int<lower=0> N;
##   vector[N] y;     // efficiency
##   vector[N] x;     // observation timestamp
## }
## 
## parameters {
##   real alpha;
##   real beta;
##   real<lower=0> sigma;
## }
## 
## transformed parameters {
##   vector[N] mu = alpha + beta * x;
## }
## 
## model {
##   alpha ~ normal(.3, .2);
##   beta ~ normal(0, .1);
##   sigma ~ normal(.25, .5);
##   y ~ normal(mu, sigma);
## }
## 
## generated quantities {
##   vector[N] y_rep;    // y_rep for posterior predictive check
##   vector[N] log_lik;  // log_lik for LOO
##   
##   for (n in 1:N) {
##     y_rep[n] = normal_rng(mu[n], sigma);
##     log_lik[n] = normal_lpdf(y[n] | mu[n], sigma);
##   }
## }

5.2 Fit

stan_data <- list(N = nrow(w),
                  y = w$mug,
                  x = 1:nrow(w))
fit.1 <- sampling(comp_m, data = stan_data, refresh = -1, seed = 42,
                  sample_file = "m1_sample",
                  diagnostic_file = "m1_diag")
summary(fit.1, pars = c('alpha', 'beta', 'sigma'), probs = c(.05, .5, .95))$summary
##               mean      se_mean           sd           5%          50%
## alpha 0.2735652996 4.203728e-04 1.533525e-02 0.2477480264 0.2738037276
## beta  0.0002809177 2.236874e-06 9.013461e-05 0.0001335225 0.0002781514
## sigma 0.1316872476 1.424872e-04 5.326438e-03 0.1231414140 0.1313981050
##                95%    n_eff     Rhat
## alpha 0.2983256698 1330.800 1.003874
## beta  0.0004298759 1623.679 1.002689
## sigma 0.1408143652 1397.403 1.004685
# Beta is almost certainly positive
mcmc_hist(fit.1, pars = c('beta'))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The model fits well with Rhat values <1.01, a good number of effective samples, and no divergencies in HMC computation. The slope parameter beta is slightly positive so the efficiency trend would be increasing over time. A 12 months average of data seems to give a similar result:

rbind(filter(d, timestamp < d$timestamp[1]+months(12)) %>%
    summarise(mean(geneff), mean(capacity)),
  filter(d, timestamp > last(timestamp)-months(12)) %>%
    summarise(mean(geneff), mean(capacity)))

5.3 Plot the Trend

mu <- apply(extract(fit.1)$mu, 2, quantile, c(0.05, 0.5, 0.95)) %>%
  t() %>% data.frame(.)
mu$ts = w$ts
(gglr <- ggw + 
  geom_line(data = mu, aes(y = X50.), color = "#FF99FFDD") +
  geom_ribbon(data = mu, aes(y = X50., ymin = X5., ymax = X95.), fill = "#FF99FF44") +
  labs(title = "Linear regression and its 90% probability interval"))

5.4 Posterior Predictive Check

yrep <- extract(fit.1, pars = 'y_rep')$y_rep
bayesplot_grid(
  ppc_dens_overlay(w$mug, yrep[1:50, ]),
  ppc_stat(w$mug, yrep[1:500,], stat = function(x){sum(x<0)}, binwidth = 1),
  grid_args = list(ncol=2), titles = c("Simulated", "<0 values"))

Posterior predictive checks show that the model is not perfect. The normal distribution misses a shoulder in the generation efficiency density curve. That’s a minor issue. The posterior also seems to have trouble with mode and right-hand tail, and above all, there’s probability mass for negative efficiency values. Let’s resolve these issues with a data transformation.

5.5 PSIS-LOO

loo <- loo(fit.1, save_psis = FALSE)
print(loo)
## 
## Computed from 4000 by 298 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    180.3 12.0
## p_loo         3.1  0.4
## looic      -360.5 24.1
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

The LOO estimate is accurate with all k values < 0.5. elpd_loo is 180.

6 Data Transformed Gaussian

We are estimating power generation efficiency, which is a relative value from 0 to 1. As we saw in the posterior predictive check section above, the model predicts some negative values. Let’s use logit transformation to prevent values outside the range (0,1). The model becomes

\[\begin{aligned} logit(y_{i}) & \sim Normal(\mu_i, \sigma) \\ \mu_i & = a + b \times t_i \\ \sigma & \sim N(.5, .5), >0 \\ a & \sim N(-0.85, 0.85) \\ b & \sim N(0, 0.1) \end{aligned}\]

Let’s fit a Gaussian model to logit-transformed data.

fit <- stan(file = "linear_gaussian_logit.stan", data = stan_data, refresh = 0)
print(fit, pars = c("alpha", "beta", "sigma"), probs = c(.05, .5, .95))
## Inference for Stan model: linear_gaussian_logit.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##        mean se_mean   sd    5%   50%   95% n_eff Rhat
## alpha -1.05       0 0.08 -1.18 -1.05 -0.92  1619    1
## beta   0.00       0 0.00  0.00  0.00  0.00  1886    1
## sigma  0.66       0 0.03  0.62  0.66  0.70  1453    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Dec  4 14:27:56 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

6.1 Posterior Predictive Check

This looks good. The tails are quite good and the mode is more correct.

yrep <- extract(fit, pars = 'y_rep')$y_rep
bayesplot_grid(
  ppc_dens_overlay(w$mug, yrep[1:50, ]),
  ppc_stat(w$mug, yrep[1:500,], stat = function(x){sum(x>0.5)}, binwidth = 1),
  ppc_stat(w$mug, yrep[1:500,], stat = function(x){sum(x<0)}, binwidth = 1),
  grid_args = list(nrow=3), titles = c("Simulation", ">.5 values", "<0 values"))

6.2 Plot

theta <- apply(extract(fit)$theta, 2, quantile, c(0.05, 0.5, 0.95)) %>%
  t() %>% data.frame(.)
theta$ts = w$ts
gglr + geom_line(data = theta, aes(y = X50.), color = "#D55E00DD") +
  geom_ribbon(data = theta, aes(y = X50., ymin = X5., ymax = X95.), fill = "#D55E0022") +
  labs(title = "Linear regression vs Logit transformed regression")

The fit looks very similar, except that the model using logit-transformed values lies a bit lower. Slope and uncertainty seem very similar. Let’s zoom in to get a better idea of the difference.

gglr + geom_line(data = theta, aes(y = X50.), color = "#D55E00DD") +
  geom_ribbon(data = theta, aes(y = X50., ymin = X5., ymax = X95.), fill = "#D55E0022") +
  ylim(.2, .4) +
  annotate("text", label = "Original", x = w$ts[15], y = .307, color = "#FF99FF") +
  annotate("text", label = "Logit", x = w$ts[10], y = .24, color = "#D55E00DD") +
  labs(title = "Zoomed: Linear regression vs Logit transformed regression")
## Warning: Removed 136 rows containing missing values (geom_point).

6.3 Comparison

Which curve fits the data better? One way to determine this is to compare the MSE or MAE of each fit. The result: the curve fits are pretty much identical. Based on the posterior predictive check the logit-transformed version is better for predicting but the difference is so small that it’s not worth the extra complexity.

# MSE
round(c(mean((theta$X50. - w$mug)^2), mean((mu$X50. - w$mug)^2)), digits = 4)
## [1] 0.0173 0.0171
# MAE
round(c(mean(abs(theta$X50. - w$mug)), mean(abs(mu$X50. - w$mug))), digits = 3)
## [1] 0.105 0.107

6.4 Stan Code

cat(get_stancode(fit))
## data {
##   int<lower=0> N;
##   vector[N] y;     // efficiency
##   vector[N] x;     // observation timestamp
## }
## 
## parameters {
##   real alpha;
##   real beta;
##   real<lower=0> sigma;
## }
## 
## transformed parameters {
##   vector[N] mu = alpha + beta * x;
## }
## 
## model {
##   alpha ~ normal(-.85, .85);
##   beta ~ normal(0, .1);
##   sigma ~ normal(.5, .5);
##   logit(y) ~ normal(mu, sigma);
## }
## 
## generated quantities {
##   vector[N] y_rep;    // y_rep for posterior predictive check
##   vector[N] log_lik;  // log_lik for LOO
##   vector[N] theta = inv_logit(mu);
##   
##   for (n in 1:N) {
##     y_rep[n] = inv_logit(normal_rng(mu[n], sigma));
##     log_lik[n] = normal_lpdf(logit(y[n]) | mu[n], sigma);
##   }
## }

7 Hierarchical Model

Add a seasonal component in the model. For this we need to group the data by month for both mean and standard deviation.

group_month_mean <- function(until) {
  # How many years of data we have (if > 50% of a year present, then fill na)
  N <- max(year(until)) - 2014
  if (month(until) < 7) N <- N-1
  names <- c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
  data <- setNames(data.table(matrix(nrow = 0, ncol = 12)), names)
  for (n in 1:N) {
    row <- rep(NA, 12)
    monthly <- filter(d, timestamp < until) %>%
      filter(year(timestamp) == (n+2014)) %>%
      group_by(month = month(timestamp)) %>%
      summarise(mug = mean(geneff), .groups = 'drop')
    first_month <- min(monthly$month)
    last_month <- max(monthly$month)
    row[first_month:last_month] <- monthly$mug
    data <- rbind(data, t(row), use.names=FALSE)
  }
  # fill NA with mean
  mutate_all(data, ~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x))
}
group_month_sd <- function(until) {
  # How many years of data we have (if > 50% of a year present, then fill na)
  N <- max(year(until)) - 2014
  if (month(until) < 7) N <- N-1
  names <- c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
  data <- setNames(data.table(matrix(nrow = 0, ncol = 12)), names)
  for (n in 1:N) {
    row <- rep(NA, 12)
    monthly <- filter(d, timestamp < until) %>%
      filter(year(timestamp) == (n+2014)) %>%
      group_by(month = month(timestamp)) %>%
      summarise(sdg = sd(geneff), .groups = 'drop')
    first_month <- min(monthly$month)
    last_month <- max(monthly$month)
    row[first_month:last_month] <- monthly$sdg
    data <- rbind(data, t(row), use.names=FALSE)
  }
  # fill NA with mean
  mutate_all(data, ~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x))
}
monthly.means <- group_month_mean(last(d$timestamp))
monthly.sds <- group_month_sd(last(d$timestamp))

7.1 Compile and Fit

upto <- nrow(w)
stan_data <- list(N = nrow(monthly.means), J = 12, 
                  mmeans = as.matrix(monthly.means), 
                  msds = as.matrix(monthly.sds),
                  W = upto, geneff = w$mug[1:upto], 
                  wmonth = w$month[1:upto], x = 1:upto)
fit <- stan(file = "hier_season_trend.stan", data = stan_data, refresh = 0,
            seed = 42, sample_file = "hier_sample",
            diagnostic_file = "hier_diag")
print(fit, pars = c("a", "b", "tsigma", "theta", "tau"), probs = c(.05, .5, .95))
## Inference for Stan model: hier_season_trend.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##         mean se_mean   sd    5%   50%  95% n_eff Rhat
## a      -0.04       0 0.03 -0.08 -0.03 0.01   624 1.01
## b       0.00       0 0.00  0.00  0.00 0.00  5827 1.00
## tsigma  0.11       0 0.00  0.11  0.11 0.12  3767 1.00
## theta   0.31       0 0.06  0.22  0.31 0.41  2241 1.00
## tau     0.19       0 0.01  0.18  0.19 0.21  3157 1.00
## 
## Samples were drawn using NUTS(diag_e) at Fri Dec  4 14:30:40 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

7.2 Stan Code with Comments

The hierarchical model adds up the two components, time series trend and seasonality. The trend component is the same as before. The hierarchical part is the seasonal component, which finds the common characteristics for each month over the years of data.

cat(get_stancode(fit))
## data {
##   // season data
##   int<lower=1> N;
##   int<lower=1> J;
##   matrix[N, J] mmeans; // monthly means
##   matrix[N, J] msds;   // monthly deviations
##   // trend data
##   int<lower=0> W;      // number of weeks in time series
##   vector[W] geneff;    // weekly generation efficiency
##   int<lower=1, upper=12> wmonth[W];    // observation's month
##   vector[W] x;         // timestep
## }
## 
## parameters {
##   // hierarchical seasonal
##   real theta;
##   real<lower=0> tau;
##   vector[J] monthmu;
##   real<lower=0> monthsd[J];
##   // time series trend
##   real a;               // trend intercept
##   real b;               // trend slope
##   real<lower=0> tsigma; // trend sd
##   real<lower=0> ssigma; // season sd sd
## }
## 
## transformed parameters {
##   vector[W] trend = a + b * x;
##   vector[W] mu;
##   for (n in 1:W) {
##     mu[n] = trend[n] + monthmu[wmonth[n]];
##   }
## }
## 
## model {
##   // hierarchical seasonal
##   theta ~ normal(.3, .2);
##   tau ~ normal(.25, .5);
##   monthmu ~ normal(theta, tau);
##   ssigma ~ normal(.25, .5);
##   monthsd ~ normal(tau, ssigma);
##   for (j in 1:J) {
##     mmeans[, j] ~ normal(monthmu[j], monthsd[j]);
##     msds[, j] ~ normal(monthsd[j], ssigma);
##   }
##   // timeseries
##   a ~ normal(.3, .2);
##   b ~ normal(0, .1);
##   tsigma ~ normal(.25, .5);
##   geneff ~ normal(mu, tsigma);
## }
## 
## generated quantities {
##   vector[W] log_lik; // log_lik for LOO
##   vector[W] yrep;    // y_rep for posterior predictive check
##   
##   for (n in 1:W) {
##     log_lik[n] = normal_lpdf(geneff[n] | mu[n], tsigma);
##     yrep[n] = normal_rng(mu[n], tsigma);
##   }
## }

7.3 Convergence Diagnostics

The hierarchical model is not trivial so let’s take a closer look at the diagnostics.

check_hmc_diagnostics(fit)
## 
## Divergences:
## 0 of 4000 iterations ended with a divergence.
## 
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 10.
## 
## Energy:
## E-BFMI indicated no pathological behavior.

Everything looks good in terms of the HMC sampling: there are no divergencies or pathological behavior.

stan_rhat(fit, bins = 30)

The Rhat values are <1.01, which implies that the chains have converged properly.

stan_ess(fit, bins = 30)

The effective sample size histogram shows some low values. Let’s take a closer look:

ddiag <- as.data.frame(monitor(fit, print = FALSE))
ddiag %>% arrange(n_eff) %>% select(Rhat, n_eff, Bulk_ESS, Tail_ESS) %>% head()
ddiag %>% slice_max(Rhat, n = 5, with_ties = FALSE) %>% select(Rhat, n_eff)

Even the lowest number of effective sample sizes are >500 so the results are reasonable reliable.

7.4 Posterior Predictive Check

yrep <- extract(fit, pars = 'yrep')$yrep
bayesplot_grid(
  ppc_dens_overlay(w$mug, yrep[1:50, ]),
  ppc_stat(w$mug, yrep[1:500,], stat = function(x){sum(x<0)}, binwidth = 1),
  grid_args = list(ncol=2), titles = c("Simulated", "<0 values"))

The posterior predictions are similar to the linear gaussian model, and the trend slope is again slightly positive.

mcmc_hist(fit, pars = c('b'))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

7.4.1 Seasonal Component

mcmc_areas_ridges(fit, pars = "theta", regex_pars = "monthmu") +
  vline_at(mean(extract(fit, pars = "theta")$theta))

The difference between months is quite big, and follows surprisingly well the thermal season concept presented above, in the Introduction.

The wind power generation efficiency is highest in Nov, Dec, Jan, Feb, and March. April, Sep, and Oct are in the middle, and May, June, July, and August have the lower average generation effieciency, according to the model.

7.5 Plot

mu <- apply(extract(fit)$mu, 2, quantile, c(0.05, 0.5, 0.95)) %>%
  t() %>% data.frame(ts = w$ts, .)
(gglr <- ggw + 
  geom_line(data = mu, aes(y = X50.), color = "#D55E00DD") +
  geom_ribbon(data = mu, aes(y = X50., ymin = X5., ymax = X95.), fill = "#D55E0044") +
  labs(title = "Hierarchical regression and its 90% probability interval"))

7.6 PSIS-LOO

loo(fit, save_psis = FALSE)
## 
## Computed from 4000 by 298 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    215.9 11.8
## p_loo        13.1  1.1
## looic      -431.8 23.7
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

The LOO estimate is accurate with all k values < 0.5. elpd_loo is 216.

# MSE
mean((mu$X50. - w$mug)^2)
## [1] 0.01255666

8 Model Comparison

We’ll compare the linear Gaussian model, “M1” and the hierarchical seasonal model, “M2”.

Model MSE elpd_loo
M1 0.017 180
M2 0.013 216

MSE measures the fit of the curve. M2 fits a bit better. elpd_loo is used to estimate the predictive performance. Here again M2 has a better value, and the difference is significant, when compared with the elpd_loo standard error.

M2 is a better model. The problem is that both metrics measure without making a distinction between the past and the future. To really get an idea of the predictive performance, we’d need to predict the future and check the prediction accuracy, or utilize Leave-Future-Out instead of Leave-One-Out.

9 Predictive Performance

The data is so noisy that the predictive performance for generation efficiency is far from perfect, and probably not very interesting beyond long term averages.

The monthly variance, on the other hand, is much stronger case for predicting. We can confidently state that the expected generation efficiency in July is almost certainly less than that in the winter months.

The trend slope is positive. The predicted yearly (52 weeks) increase is at 90% probability between 0.8% and 2.1%, with a mean at 1.4%.

# Weekly increase according to the trend slope
beta <- extract(fit, pars = "b")$b
c(mean(beta), quantile(beta, c(.05, .5, .95)))
##                        5%          50%          95% 
## 0.0002750601 0.0001498293 0.0002751220 0.0003998332

9.1 Forecasting Next Six Weeks

pred.data <- list(x = 298:303, # Forecast 6 weeks, until Dec 26th 2020
                  ts = c(last(w$ts), last(w$ts) + weeks(1:5)))
pred.pars <- extract(fit, pars = c("a", "b", "tsigma", "monthmu[11]", 
                                   "monthmu[12]", "monthsd[11]", "monthsd[12]"))
# 4000 x 6 weeks
pred.trend <- as.vector(pred.pars$a) + pred.pars$b %*% t(pred.data$x)
pred.season <- cbind(matrix(rep(pred.pars$"monthmu[11]", 2), ncol=2, byrow=TRUE),
                     matrix(rep(pred.pars$"monthmu[12]", 4), ncol=4, byrow=TRUE))
pred.seassd <- cbind(matrix(rep(pred.pars$"monthsd[11]", 2), ncol=2, byrow=TRUE),
                     matrix(rep(pred.pars$"monthsd[12]", 4), ncol=4, byrow=TRUE))
ssize <- 4000
pred.mu <- pred.trend + pred.season
pred.pred <- matrix(data = 0, nrow = ssize, ncol = 6)
for (i in 1:6) {
  pred.pred[,i] <- rnorm(ssize, mean = pred.mu[,i], sd = pred.pars$tsigma)
}
pred.quant <- apply(pred.pred, 2, quantile, c(0.05, 0.5, 0.95)) %>%
  t() %>% data.frame(ts = pred.data$ts, .)
pred.quantmu <- apply(pred.mu, 2, quantile, c(0.05, 0.5, 0.95)) %>%
  t() %>% data.frame(ts = pred.data$ts, .)
ggw +
  geom_line(data = mu, aes(y = X50.), color = "#D55E00DD") +
  geom_ribbon(data = mu, aes(y = X50., ymin = X5., ymax = X95.), fill = "#D55E0044") +
  geom_line(data = pred.quant, aes(y = X50.), color = "#FF99FFDD") +
  geom_ribbon(data = pred.quant, aes(y = X50., ymin = X5., ymax = X95.), fill = "#FF99FF44") +
  geom_line(data = pred.quantmu, aes(y = X50.), color = "#FF99FFDD") +
  geom_ribbon(data = pred.quantmu, aes(y = X50., ymin = X5., ymax = X95.), fill = "#FF99FF44") +
  labs(title = "Prediction and its 90% probability interval") +
  xlim(w$ts[220], pred.data$ts[6])
## Warning: Removed 219 rows containing missing values (geom_point).
## Warning: Removed 219 row(s) containing missing values (geom_path).

As expected, the 90% interval is wide. While that’s realistic, it’s not necessarily good for anything practical.

10 Sensitivity Analysis

We have many observations, relatively simple data, and reasonably good prior knowledge. Thus the assumption is that the analysis is not sensitive to prior choice. Let’s test the assumption by changing the priors to gamma(0.25, 1) for standard deviation parameters and gamma(3, 10) for efficiency parameters in both the trend and hierarchical seasonal parts of the model. These gamma priors are perhaps more natural because they constraint the values realistically to positive.

upto <- nrow(w)
stan_data <- list(N = nrow(monthly.means), J = 12, 
                  mmeans = as.matrix(monthly.means), 
                  msds = as.matrix(monthly.sds),
                  W = upto, geneff = w$mug[1:upto], 
                  wmonth = w$month[1:upto], x = 1:upto)
fit <- stan(file = "hier_season_trend_altprior.stan", data = stan_data, refresh = 0, seed = 42)
print(fit, pars = c("a", "b", "tsigma", "theta", "tau"), probs = c(.05, .5, .95))
## Inference for Stan model: hier_season_trend_altprior.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##        mean se_mean   sd   5%  50%  95% n_eff Rhat
## a      0.03       0 0.01 0.01 0.02 0.05  1278    1
## b      0.00       0 0.00 0.00 0.00 0.00  3112    1
## tsigma 0.12       0 0.00 0.11 0.11 0.12  4116    1
## theta  0.26       0 0.06 0.17 0.26 0.35  3181    1
## tau    0.19       0 0.01 0.18 0.19 0.21  2749    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Dec  4 14:33:50 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).
loo(fit, save_psis = FALSE)
## 
## Computed from 4000 by 298 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    215.3 12.0
## p_loo        12.8  1.1
## looic      -430.6 23.9
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
stan_rhat(fit, bins = 30)

stan_ess(fit, bins = 30)

The results are similar with the alternative priors. The model with these data is not sensitive to reasonable prior choice.

11 Potential Improvements

The models presented above work and the convergence diagnostics metrics are good. While the models work, there are many interesting paths forward.

11.1 Data Improvements

As seen in the plots, the data are very noisy. A good approach could be to include long moving averages in the model to downplay the effect of the stochasticity. Another idea would be to slice the data according to e.g. geographical area, and maybe find such a division that the variation would be smaller inside the area.

The logit data transformation demonstrated above resulted in a better fit in terms of simulated data, especially in the tails. While the tested metrics didn’t show much improvement, it would still be worthwhile to research the effect, especially for forecasting.

11.2 Modeling Ideas

Augmenting the model with weather data could result in more accuracy. Already adding monthly seasonal effects made a difference, so maybe more informative seasonal or weather data would help as well.

There are probably many good ideas in the traditional time series analysis genre. Testing an ARIMA model, for example, could help in getting new insight. Some auto-regressive component would probably help, even with the high variance of the data.

The current model assumes linear trend, which can’t be realistic in the long term because the efficiency values can’t go >1. In the long run this would need to be fixed, for example, by a logistic-growth model.

11.3 Diagnostic Improvements

Instead of working with metrics that include all data, we could consider metrics that differentiate between the past and the future. Such metrics are arguably better for estimating predictive power.

12 Reflection and Lessons Learnt

This exercise revealed the continuous nature of data science: there’s always the next modeling idea to test, the next parametrization to try, the next visualization method to clarify the results, and so on. It’s like the Bayesian method that results in distributions instead of point estimates. There rarely is a correct answer to any modeling question, but “it depends”. It’s like there’s a distribution of possible good answers.

Much of data analysis is actually about the data, and not the model or modeling techniques. While with good, abundant data everything is easy (at least in relative terms), with bad data it’s difficult to come up with any meaningful models and results. Spending on the first step of data analysis, good data, might be a wise investment that pays off in later steps.

An idea that feels great, may turn out inconsequential. For example, getting more precise posterior distribution via data transformation seemed like a great step forward, until checking the metrics. The data and model may be robust enough, so that there really is not much else to learn from the data, even with clever ideas.

13 Conclusion

Wind power capacity has increased more than threefold during the past six years. At the same time the average generation efficiency of the wind turbines has increased 8%.

Wind power varies seasonally. In July, the efficiency is on average only half of that in the winter months. In Finland, most energy is consumed in the cold winter time, so wind has potential to grab a bigger share of electricity generation.

The daily variation of wind power generation is huge. Because electricity is difficult to store, the variance is a problem.

Interesting additional work would be to calculate different metrics from the data. Instead of the generational efficiency, we could compute and predict, for example, the length of streaks of very low generation. The length is crucial in many cases. For example, if heating goes off for an hour or a few, it often doesn’t matter. But if heating stays off for days, it’s potentially very detrimental. The harm is not linear but grows exponentially with the length of the streak.