MCMC - the continuation

In today’s exercises we will analyse more complicated distributions (instead e.g. \(Beta\) distribution), where it is not so easy to find the posterior distribution and MCMC sampling is indeed needed.

We will use evd package (thus you need to install it).

#install.packages('evd')
library(evd)

EVD consists all major distributions used in Extreme Value Theory. We will focus on one of it, namely the Gumbel distribution. The distribution has only two 2 parameters: a location and a scale.

Don’t worry: we will not do EVT analysis, we want to use it for Bayesian inference. From the evd package we will use the dgev() family of functions (rgev(), pgev(), etc). Note that when the shape parameter is set to 0, you retrieve the Gumbel distribution.


Task 1

Use a Gumbel distribution \(X ~ Gumbel(0, 0.5)\) to generate a random sample of size 1,000. Plot the kernel density of this sample, together with the theoretical density.

#set.seed(123)

N <- 1e3
mu_sample <- 0
sigma_sample <- 0.5

sample_gumb <- rgev(
  n = N,
  loc = mu_sample,
  scale = sigma_sample,
  shape = 0
)

x <- seq(-1000, 1000, by = 0.01) # Gumbel has unbounded domain !
plot(density(sample_gumb),
     col = "red",
     main = "empircal and theoretical densities",
     lwd = 2)
lines(x, dgev(
  x,
  loc = mu_sample,
  scale = sigma_sample,
  shape = 0
), lwd = 2)
legend(
  "topright",
  lwd = 2,
  col = c("red", "black") ,
  c("density of the sample", "density of 'evd' Gumbel(0, 0.5)")
)


Task 2

Relying on the dgev() function from the evd package, write the log-likelihood function of the Gumbel distribution.

Next, compute the log-likelihood value for the sample from Task 1 (Notice that working with log-values will simplify the computations).

'logLikelihood_gumbel' <- function(mu, sigma, data) {
  llhd <- evd::dgev(
    loc = mu,
    scale = sigma,
    shape = 0,
    data,
    log = TRUE
  )
  return(sum(llhd, na.rm = TRUE))  # If ever the sample contains missing values, put na.rm=T.
  #But this is not the case here !
}
logLikelihood_gumbel(mu = mu_sample, sigma = sigma_sample, data = sample_gumb)
## [1] -893.6924

Task 3

Consider a non-informative prior, in terms of a large variance normal prior (sd = 50) for each of the two parameters individually centered on the true parameter value. Using the likelihood function from Task 2, write a function that computes the log-posterior. (HINT : if you work with log values, you can simply sum the terms.) Then, compute the log-posterior value of the generated sample.

'logPosterior_gumb' <- function(mu, sigma, data) {
  llhd <- dgev(
    loc = mu,
    scale = sigma,
    shape = 0,
    data,
    log = TRUE
  )

  llhd <- sum(llhd, na.rm = TRUE)
  lprior <- dnorm(mu, sd = 50, log = TRUE)
  lprior <- lprior + dnorm(sigma, sd = 50, log = TRUE)

  return(lprior + llhd)
}
logPosterior_gumb(mu = mu_sample, sigma = sigma_sample, data = sample_gumb)
## [1] -903.3544

This value should be close to the log-likelihood found in task 2.


Task 4

Now you have all parts needed to the Gibbs sampler.

  • Take the mean and the SD of the generated sample from the task 1 as starting values for the location and the scale parameters and make 1,000 iterations.

  • Initialize the output data-frame (or matrix) which will contain the generated chains for the parameters.

  • Run the algorithm by taking normal distributions for proposals (these are symmetric distributions) with 0.05 and 0.01 as the respective standard deviations.

iter <- 1000
# The standard deviations of the proposals
proposal.sd  <- c(.05, .01)

# Initialization
out <- data.frame(mu = rep(NA, iter + 1),
                  sigma = rep(NA, iter + 1))
# Starting values in the first row
out[1, ] <- c(start_mu, start_sigma)

lpost_old <- logPosterior_gumb(mu = out[1, 1], sigma = out[1, 2],  data = sample_gumb)


# Reproducible example
set.seed(1234)

for (t in 1:iter) {

  ## 1) For the parameter mu 

 prop.mu <- rnorm(1, mean = out[t,1], proposal.sd[1])
 # symmetric, so that it removes in the ratio.

 lpost_prop <- logPosterior_gumb(prop.mu, out[t,2],  data = sample_gumb)

  r <- exp(lpost_prop - lpost_old) # sine the proposal is symmetric

    # Accept
  if (r > runif(1)) {
    out[t + 1, 1] <- prop.mu
    lpost_old <- lpost_prop
  } # reject
  else  out[t + 1, 1] <- out[t, 1]

    ## 1) For the parameter sigma 
 prop.sigma <- rnorm(1, mean = out[t,2], proposal.sd[2])
 # symmetric, so that it removes in the ratio.

 lpost_prop <- logPosterior_gumb(out[t,1], prop.sigma, data = sample_gumb)

  r <- exp(lpost_prop - lpost_old) # sine the proposal is symmetric

   # Accept
  if (r > runif(1)) {
    out[t + 1, 2] <- prop.sigma
    lpost_old <- lpost_prop
  } # reject
  else  out[t + 1, 2] <- out[t, 2]
}

Task 5

  1. Check visually that the chains of the Gibbs sampler converge. Do they seem stationary?

  2. Then, check that the chains are indeed updating one at a time while they are updated simultaneously in the Metropolis algorithm. HINT: Recall the function for the Metropolis algorithm created in the previous set of exercises.

  3. What do you observe with the starting values used to generate these chains?

  1. First of all, the chains look stationary. One other thing to do would be to run the chains with different starting values and check if the convergence still occurs.

  2. We can see in the above plots that the two chains have different updates

  3. The starting values are located well from the region where the probability mass of the chains is located. Note that this is a good news about the convergence.


Task 6

The acceptance rate of the Gibbs sampler (ex. the proportion of times that a proposal is accepted in the algorithm) is crucial for the efficiency of the algorithm. Complete the previous code in order to retrieve the acceptance rate of this Gibbs sampler.

# Initialization
# Starting values in the first row
out[1, ] <- c(start_mu, start_sigma)

lpost_old <- logPosterior_gumb(mu = out[1, 1], sigma = out[1, 2],  data = sample_gumb)

# Add this
acc_rates <- matrix(nrow = iter, ncol = 2)

# Reproducible example
set.seed(1234)

for (t in 1:iter) {
  ## 1) For the parameter mu

  prop.mu <- rnorm(1, mean = out[t, 1], proposal.sd[1])
  # symmetric, so that it removes in the ratio.

  lpost_prop <-
    logPosterior_gumb(prop.mu, out[t, 2],  data = sample_gumb)

  r <- exp(lpost_prop - lpost_old) # sine the proposal is symmetric

  # Accept
  if (r > runif(1)) {
    out[t + 1, 1] <- prop.mu
    lpost_old <- lpost_prop
  } # reject
  else
    out[t + 1, 1] <- out[t, 1]
  acc_rates[t, 1] <- min(r, 1)

  ## 1) For the parameter sigma
  prop.sigma <- rnorm(1, mean = out[t, 2], proposal.sd[2])
  # symmetric, so that it removes in the ratio.

  lpost_prop <-
    logPosterior_gumb(out[t, 1], prop.sigma, data = sample_gumb)

  r <- exp(lpost_prop - lpost_old) # sine the proposal is symmetric

  # Accept
  if (r > runif(1)) {
    out[t + 1, 2] <- prop.sigma
    lpost_old <- lpost_prop
  } # reject
  else
    out[t + 1, 2] <- out[t, 2]
  acc_rates[t, 2] <- min(r, 1)

}

# Check the proportion for the chain of BOTH parameters
apply(acc_rates, 2, mean)
## [1] 0.3640119 0.6685295

Task 7

Gather everything you did earlier into one function (add a parameter for the burn-in inside the function).

# The function :
# - start specifies the vector of starting values for mu and sigma
# - proposal.sd specifies the variance of the proposals for mu and sigma
# - data specifies the dataset (sample) to use
# - iter specifies the number of iterations for the MCMC sampler.
# - (bonus!) Burnin specifies the number of first iterations to remove (for convergence issues)
'gibbsSampler.fun' <- function(start,
                             proposal.sd,
                             data = sample_gumb,
                             iter = 1e3,   # Default values after "=" sign
                             burnin = 0) {
  # Initialization
  out <- data.frame(mu    = rep(NA, iter + 1),
                    sigma = rep(NA, iter + 1))
  # Starting values in the first row
  out[1,] <- start

  lpost_old <-
    logPosterior_gumb(mu = out[1, 1], sigma = out[1, 2],  data = sample_gumb)

  # Add this
  acc_rates <- matrix(nrow = iter, ncol = 2)

  # Reproducible example
  set.seed(1234)

  for (t in 1:iter) {
  ## 1) For the parameter mu

  prop.mu <- rnorm(1, mean = out[t, 1], proposal.sd[1])
  # symmetric, so that it removes in the ratio.

  lpost_prop <-
    logPosterior_gumb(prop.mu, out[t, 2],  data = sample_gumb)

  r <- exp(lpost_prop - lpost_old) # sine the proposal is symmetric

  # Accept
  if (r > runif(1)) {
    out[t + 1, 1] <- prop.mu
    lpost_old <- lpost_prop
  } # reject
  else
    out[t + 1, 1] <- out[t, 1]
  acc_rates[t, 1] <- min(r, 1)


  ## 1) For the parameter sigma
  prop.sigma <- rnorm(1, mean = out[t, 2], proposal.sd[2])
  # symmetric, so that it removes in the ratio.

  lpost_prop <-
    logPosterior_gumb(out[t, 1], prop.sigma, data = sample_gumb)

  r <- exp(lpost_prop - lpost_old) # sine the proposal is symmetric

  # Accept
  if (r > runif(1)) {
    out[t + 1, 2] <- prop.sigma
    lpost_old <- lpost_prop
  } # reject
  else
    out[t + 1, 2] <- out[t, 2]
  acc_rates[t, 2] <- min(r, 1)

}

  return(list(
    mean.acc_rates = apply(acc_rates, 2, mean),
    out.chain = data.frame(out[-(1:burnin),],
                           iter = 1:nrow(out[-(1:burnin),]))
  ))
}

Task 8

It is recommended to target acceptance rates of around 40% for the Gibbs sampler since components are updated one at a time. Use the function created above to find better values for the proposal standard deviations in order to reach these target acceptance rates.

# Run the function
test_GibbsSampler <-
  gibbsSampler.fun(start =  c(start_mu, start_sigma),
                   proposal.sd =  c(.05, .01))
# Check the acceptance rate
test_GibbsSampler$mean.acc_rates
## [1] 0.3640119 0.6685295

After some tweaks you should get sth like:

## [1] 0.4294051 0.4021660

Slightly too high for \(sigma\) and not enough for \(mu\). Play with proposal.sd to make them close to 0.4.

When you will be ready visualize the new traceplots with the optimized \(sigma\) and \(mu\)


Task 9

Provide a nice visualization (ggplot2) of the chains that also shows the Bayesian posterior estimates of the parameters (Take the median).

library(ggplot2)
library(gridExtra)
"theme_piss" <- function(size_p = 18,
                         size_c = 14,
                         size_l = 12,
                         theme = theme_bw(),
                         ...) {
  text <-
    function(size, ...)
      element_text(size = size,
                   colour = "#33666C",
                   face = "bold",
                   ...)
  theme +
    theme(
      plot.title = text(size_p, hjust = 0.5),
      axis.title = text(size_c),
      legend.title = text(size_l)
    ) +
    theme(
      legend.background = element_rect(colour = "black"),
      legend.key = element_rect(fill = "white"),
      ...
    )
}


gg_mu <- ggplot(test_GibbsSampler2$out.chain) +
  geom_hline(aes(yintercept = median(mu), col = "Posterior Estimate"), linetype = "dashed") +
  geom_hline(aes(yintercept = mu_sample,
             col = "True value"), linetype = "dashed") +
  geom_line(aes(x = iter, y = mu)) +
  coord_cartesian(ylim = c(-0.05, 0.1)) + # Recenter plot
  labs(title = "Traceplot for mu") +
  scale_colour_manual(name = "Value", values = c( "True value" = "blue", "Posterior Estimate" = "red")) +
  theme_piss(legend.position = c(0.85, 0.85))

gg_sigma <- ggplot(test_GibbsSampler2$out.chain) +
  geom_hline(aes(yintercept = median(sigma), col = "Posterior Estimate"), linetype = "dashed") +
  geom_hline(aes(yintercept = sigma_sample,
             col = "True value"), linetype = "dashed") +
  geom_line(aes(x = iter, y = sigma)) +
  coord_cartesian(ylim = c(0.46, 0.55)) + # Recenter plot
  labs(title = "Traceplot for sigma") +
  scale_colour_manual(name = "Value", values = c( "True value" = "blue", "Posterior Estimate" = "red")) +
  theme_piss(legend.position = c(0.85, 0.85))

# The blue line represent the actual value, and the red line is the (median) posterior estimate
grid.arrange(gg_mu, gg_sigma, nrow = 1)

Check whether increasing the number of iterations or adding a burn-in period could improve the results e.g. increase the number of iterations to (5k, 10k, …) and add burn-in period (1k, 2k, …).

## [1] 0.4180476 0.4007492

Conclusion remarks

Does the increasing the number of iterations or adding a burn-in period could improve the result?