1 Background

BIT worked with the AER to conduct an online framed field experiment to test different versions of a proposed benefit change notification. This trial is the second in a series of three that were conducted. 2,285 respondents saw one of three versions of the notification, and were asked about their intended behaviour.

Following trial one, we took the “Lose discount” letter for trial 2, on the basis that it had the highest comprehension and may have reduced the proportion doing nothing.

In addition to the original two comprehension questions, we also added three extra questions, to allow for further analysis of comprehension. We measured “narrow” comprehension (comprehension on the two questions that appeared in trial 1) as well as “broad” comprehension (comprehension across all five questions). Finally, we removed the section that asked participants to enter their information into EME – we did not expect to gain any new information, and this section saw significant attrition from respondents.

The key findings included:

  • Intention to visit EME did not appear to vary by treatment
  • The “lose discount” headlines lead to higher comprehension

This document is intended as a supporting document to the final report, giving technical details of the analysis underpinning these findings.

2 Intervention

Trial 2 saw three new letters – a “no headline” letter, a letter that expressed the loss aversion as a percentage, and a letter that added a social norm to the “lose discount” letter.

Below are the four versions of the letter that were trialled.

No headline Loss headline ($)
No headline Loss headline ($)
Loss headline (%) Loss + Social norm
Loss headline (%) Loss + Social norm

3 Estimation strategy

3.1 Statistical model

Before the experiment was run, a number of analyses were pre-specified, to ensure that these findings will be robust to future scaling and replication. Below, we have presented the results of all pre-specified analyses.

For each outcome, we have presented two estimates of the impact of each element of the letter that was varied:

  1. A regression using just the characteristics of the letter as a predictor

\[\text{Outcome}_i = \beta_1 \cdot \text{Letter 1}_i + \beta_2 \cdot \text{Letter 2}_i + \beta_3 \cdot \text{Letter 3}_i + \beta_4 \text{Letter 4}_i + \epsilon_i\]

  1. A regression using the random allocation and a set of covariates to improve precision

\[ \begin{aligned} \text{Outcome}_i = &\beta_1 \cdot \text{Letter 1}_i + \beta_2 \cdot \text{Letter 2}_i + \beta_3 \cdot \text{Letter 3}_i + \beta_4 \text{Letter 4}_i + \\ &\beta_4 \cdot \text{income}_i + \beta_5 \cdot \text{education}_i + \beta_6 \cdot \text{numeracy_score}_i + \\&\beta_7 \cdot \text{switched_provider}_i + \beta_8 \cdot \text{switched_plan}_i +\epsilon_i \end{aligned} \]

Additionally, if the outcome is binary, we have presented the results from both an standard linear regression and the estimated average marginal effect from a logistic regression. All errors reported are heteroskedasticity-consistent ‘robust’ errors.

For all analyses, we present a regression table with the estimates of interest, with standard errors in parentheses, and a bar chart showing the estimates as 4 different treatment groups.

3.2 Covariates

The covariates used in the model are define as follows:

  • Income - entered as a numeric variable, giving the average value of the income bracket they reported
  • Education - entered as a categorical variable, with four possible values:
    1. “Did not finish high school”
    2. “High school graduate”
    3. “Undergraduate”
    4. “Post-graduate”
  • Numeracy scores - entered as a numeric variable, based on their performance in the numeracy scores. Ranges from 0 to 4.
  • Switched provider - categorical variable, which indicates the answer given to “How long have you been with your current energy provider?”, chosen from the options below
    • Less than 1 year
    • Between 1 and 2 years
    • Between 2 - 4 years
    • More than 4 years
    • Don’t know
  • Switched plans - a binary variable, indicating whether they’ve ever switched plans with their current provider

4 Balance

Individuals were randomised into seeing one of the four versions of the letter. Below are summary statistics by group, showing differences in between these treatment groups by the covariates used in the full model. We did not observe any meaningful differences in the covariates.

covars <- c("education_alt", "num_score", "energy1", "energy2")

results %>% 
    select(treatment, one_of(covars)) %>% 
    gather(key, value, -treatment) %>%
    mutate(key = case_when(
        .$key == "education_alt" ~ "Education",
        .$key == "num_score" ~ "Numeracy",
        .$key == "energy1" ~ "Switch provider",
        .$key == "energy2" ~ "Switch plan"
    ),
    treatment = str_c("Letter ", treatment)) %>% 
    count(treatment, key, value) %>% 
    filter(key == "Education") %>% 
ggplot(aes(value, n, fill = treatment)) +
    geom_col(position = "dodge") +
    graph_flip() +
    scale_fill_BIT() +
    labs(x = "", title = "Education")

results %>% 
    select(treatment, one_of(covars)) %>% 
    gather(key, value, -treatment) %>%
    mutate(key = case_when(
        .$key == "education_alt" ~ "Education",
        .$key == "num_score" ~ "Numeracy score",
        .$key == "energy1" ~ "Switch provider",
        .$key == "energy2" ~ "Switch plan"
    ),
    treatment = str_c("Letter ", treatment)) %>% 
    count(treatment, key, value) %>% 
    filter(key == "Numeracy score") %>% 
ggplot(aes(fct_rev(value), n, fill = treatment)) +
    geom_col(position = "dodge") +
    graph_flip() +
    scale_fill_BIT() +
    labs(x = "", title = "Numeracy score")

results %>% 
    select(treatment, one_of(covars)) %>% 
    gather(key, value, -treatment) %>%
    mutate(key = case_when(
        .$key == "education_alt" ~ "Education",
        .$key == "num_score" ~ "Numeracy",
        .$key == "energy1" ~ "Switch provider",
        .$key == "energy2" ~ "Switch plan"
    ),
    treatment = str_c("Letter ", treatment)) %>% 
    count(treatment, key, value) %>% 
    filter(key == "Switch provider") %>% 
    mutate(value = fct_relevel(value, "Less than 1 year", "Between 1 and 2 years", "Between 2 and 4 years", "More than 4 years")) %>% 
ggplot(aes(fct_rev(value), n, fill = treatment)) +
    geom_col(position = "dodge") +
    graph_flip() +
    scale_fill_BIT() +
    labs(x = "", title = "'How long have you been with your current provider?'")

results %>% 
    select(treatment, one_of(covars)) %>% 
    gather(key, value, -treatment) %>%
    mutate(key = case_when(
        .$key == "education_alt" ~ "Education",
        .$key == "num_score" ~ "Numeracy",
        .$key == "energy1" ~ "Switch provider",
        .$key == "energy2" ~ "Switch plan"
    ),
    treatment = str_c("Letter ", treatment)) %>% 
    count(treatment, key, value) %>% 
    filter(key == "Switch plan") %>% 
ggplot(aes(value, n, fill = treatment)) +
    geom_col(position = "dodge") +
    graph_flip() +
    scale_fill_BIT() +
    labs(x = "", title = "'Have you ever switched plans with them?'")

results %>% 
    select(treatment, Income = income_mid) %>% 
    gather(key, value, -treatment) %>%
    mutate(treatment = str_c("Letter ", treatment)) %>% 
    count(treatment, key, value) %>% 
ggplot(aes(value, fct_rev(treatment), height = n, scale = 0.05, fill = treatment)) +
    ggridges::geom_ridgeline( colour = "white") +
    scale_x_continuous(breaks = pretty_breaks(10), label = dollar) +
    scale_fill_BIT() +
    labs(x = "", y = "")

5 Descriptives

results %>% 
    select(id, starts_with("difficulty_")) %>% 
    gather(key, val, -id) %>% 
    lm(val ~ key, data = .) %>% 
    bar_data() %>% 
    select(-term, term = old_term) %>% 
        mutate(term = str_replace(term, "keydifficulty_sq00|Treatment ", "")) %>% 
    mutate(
           term = case_when(.$term == "1" ~ "Using a comparison website",
                            .$term == "2" ~ "Using Energy Made Easy",
                            .$term == "3" ~ "Calling my retailer",
                            .$term == "4" ~ "Doing my own research"),
           label_text = estimate %>% sprintf("%.1f", .),
           stars = NA_character_
           ) %>% 
    ggplot(aes(term, estimate, ymax = ymax, ymin = ymin)) +
    bit_bar(-0.3) +
    scale_y_continuous(breaks = pretty_breaks(4)) +
    hide_legend() +
    labs(x = "", y = "")

# 
# save_plot("research.png")
# 
# results %>% 
#     select(id, starts_with("difficulty_")) %>% 
#     gather(key, val, -id) %>% 
#     lm(val ~ key, data = .) %>% 
#     bar_data() %>% 
#     select(-term, term = old_term) %>% 
#     mutate(estimate = estimate / 100,
#            ymax = ymax / 100,
#            ymin = ymin / 100,
#            term = str_replace(term, "keydifficulty_sq00|Treatment ", ""),
#            stars = NA_character_) %>% 
#     mutate(
#            term = case_when(.$term == "1" ~ "Using a comparison website",
#                             .$term == "2" ~ "Using Energy Made Easy",
#                             .$term == "3" ~ "Calling my retailer",
#                             .$term == "4" ~ "Doing my own research")
#            ) %>% 
#     bit_hchart() %>% 
#     hc_axis_score(1) %>% 
#     hc_title(text = "How easy do you think it would be to get a better deal through the following methods (1 - 10)", style = list(color = BIT[2], fontWeight = "bold", fontSize = 14))
#  

6 Analysis

6.1 Primary analysis

6.1.1 Did the treatment increase the likelihood that consumers would use the EME website?

Measure: Respondents were first asked what they would do on receipt of the letter – take action immediately, within a week, when they had time, or not take action at all. For those that chose any of the options that indicated they would take action, we then asked what action they would take. The options included visiting EME, visiting a non-EME comparison site, calling the retailer, doing research online (not via comparison sites), or something else.

For this analysis, the outcome variable is 1 if they responded that they would go to EME, and 0 otherwise.

EME_models <- results %>% 
    mutate(outcome = choose_EME) %>% 
    estimate_models()
    
EME_models %>% 
    print_reg() %>% 
    shorten_table(percent_table = T)
OLS Logistic OLS Logistic
(Intercept) 38.1% ***       24.3% **      
(2.0%)          (8.0%)        
Treatment 2 -5.3% +   -5.3% + -4.7% +  -4.8% +
(2.8%)    (2.8%)  (2.8%)   (2.8%) 
Treatment 3 -2.3%    -2.3% -2.4%   -2.4%
(2.9%)    (2.8%)  (2.9%)   (2.8%) 
Treatment 4 -1.5%    -1.5% -1.8%   -1.8%
(2.8%)    (2.8%)  (2.8%)   (2.8%) 
Controls for demographics No        No      Yes       Yes     
Controls for numeracy No        No      Yes       Yes     
N 2,285        2,285      2,285       2,285     
*** p < 0.001; ** p < 0.01; * p < 0.05; + p < 0.1.

Based on the trial, the proportion of respondents that said they would take some action and visit EME did not vary between the treatments, and was consistently around 35%. There is some evidence to suggest that the Loss headline ($) was less effective than the other letters, but this difference was not large enough to be statistically significant at conventional levels.

EME_models[[1]] %>%
bar_data() %>%
ggplot(aes(term, estimate, ymax = ymax, ymin = ymin)) +
    bit_bar() +
    scale_y_continuous(label = percent) +
    hide_legend() +
    labs(x = "", y = "")

# save_plot("choose_EME.png")
# 
# EME_models[[1]] %>%
#     bar_data() %>%
#     bit_hchart() %>% 
#     hc_axis_perc()
#     

6.1.2 Did the treatment decrease the likelihood that the consumer would do nothing?

Measure: Respondents were first asked what they would do on receipt of the letter – take action immediately, within a week, when they had time, or not take action at all. For those that chose any of the options that indicated they would take action, we then asked what action they would take. The options included visiting EME, visiting a non-EME comparison site, calling the retailer, doing research online (not via comparison sites), or something else.

For this analysis, the outcome variable is 1 if they responded that they would ‘Do nothing’, and 0 otherwise.

do_nothing_models <- results %>% 
    mutate(outcome = do_nothing) %>% 
    estimate_models()
    
do_nothing_models %>% 
    print_reg() %>% 
    shorten_table(percent_table = T)
OLS Logistic OLS Logistic
(Intercept) 4.7% ***       7.2% +      
(0.9%)          (4.3%)       
Treatment 2 -0.2%    -0.2% -0.0% -0.1%
(1.2%)    (1.2%)  (1.2%)  (1.2%) 
Treatment 3 2.1%    2.1% 2.2% 2.2%
(1.4%)    (1.4%)  (1.3%)  (1.3%) 
Treatment 4 2.2%    2.2% 2.5% + 2.6% +
(1.4%)    (1.4%)  (1.4%)  (1.4%) 
Controls for demographics No        No      Yes      Yes     
Controls for numeracy No        No      Yes      Yes     
N 2,285        2,285      2,285      2,285     
*** p < 0.001; ** p < 0.01; * p < 0.05; + p < 0.1.

There were no statistically significant differences between the treatments in the proportion who stated they would do nothing. The proportions appeared to be broadly consistent with trial 1, with between 4-8% of respondents stating that they would do nothing.

do_nothing_models[[1]] %>%
bar_data() %>%
ggplot(aes(term, estimate, ymax = ymax, ymin = ymin)) +
    bit_bar(-0.01) +
    scale_y_continuous(label = percent) +
    hide_legend() +
    labs(x = "", y = "")

# save_plot("do_nothing.png")
# 
# do_nothing_models[[1]] %>%
# bar_data() %>% 
#     bit_hchart(n_y = 0.45) %>% 
#     hc_axis_perc(tickInterval = 2)

6.2 Secondary analysis

6.2.1 Did the treatment increase comprehension of the letter (narrow)?

Measure: Respondents were asked two questions – firstly, what the letter was saying would happen to their energy bills next year (they would pay more because they were losing their discount, they would pay more because prices were rising generally, or they would pay less). Secondly, they were asked what the letter was asking them to do (go to EME, contact their retailer for information to use EME, contact their provider to get a better deal, use a comparison website, or something else).

For this analysis, they received 1 for each comprehension question correct and combined their scores. That means that they got 0 if they got both questions wrong, 1 if they answered one of two correctly, and 2 if they answered all questions correctly.

comp_narrow_models <- results %>% 
    mutate(outcome = comp_narrow) %>% 
    estimate_models()
    
comp_narrow_models %>% 
    print_reg() %>% 
    shorten_table(two_cols = T)
OLS OLS
(Intercept) 1.5 *** 0.9 ***
(0.0)    (0.1)   
Treatment 2 0.1 *   0.1 ** 
(0.0)    (0.0)   
Treatment 3 0.1 *   0.1 *  
(0.0)    (0.0)   
Treatment 4 0.1    0.1   
(0.0)    (0.0)   
Controls for demographics No       Yes      
Controls for numeracy No       Yes      
N 2,285       2,285      
*** p < 0.001; ** p < 0.01; * p < 0.05; + p < 0.1.

Consistent with the results of trial 1, “narrow” comprehension was highest when a headline was included. The two letters with loss aversion headlines (both the dollar amount and the percentage) had the highest level of comprehension in trial 2. These effects were statistically significant for the narrow measure, per trial 1.

results %>%
    mutate(outcome = comp_narrow) %>%
    lm(outcome ~ treatment, data = .) %>%
    bar_data() %>%
    mutate(label_text = estimate %>% sprintf("%.1f", .)) %>%
    ggplot(aes(term, estimate, ymax = ymax, ymin = ymin)) +
    bit_bar(-0.1) +
    scale_y_continuous() +
    hide_legend() +
    labs(x = "", y = "")

# 
# save_plot("comp_narrow.png")
# 
# 
# 
# results %>%
#     mutate(outcome = comp_narrow) %>%
#     lm(outcome ~ treatment, data = .) %>%
#     bar_data() %>%
#     mutate(estimate = estimate / 100,
#            ymax = ymax / 100,
#            ymin = ymin / 100) %>% 
# bit_hchart(n_y = 0.04) %>% 
#     hc_plotOptions(column = list(dataLabels = list(
#         enabled = TRUE,
#         color = "white",
#         inside = T,
#         verticalAlign = "Top",
#         format = "{y:.1f}",
#         style = list(textOutline = "none")
#         ))) %>% 
# hc_yAxis(labels = list(format = "{value:.1f}"),
#          stackLabels = list(enabled = TRUE),
#          title = "",
#              tickInterval = 0.5) %>% 
#     hc_xAxis(title = "") %>% 
#     hc_tooltip(formatter = JS("function () {return '<b> Treatment ' + (this.x + 1) +
#                 '</b>: ' + this.y}"))

6.2.2 Did the treatment increase comprehension of the letter (broad)?

Measure: Respondents were asked five comprehension questions. The first two made up the “narrow measure” and were the same as the trial 1 (see above). The next three were additional questions, added to further explore comprehension. They covered who ran the EME website (government vs commercial company vs a retailer), what the consumer got from the EME website, and whether they thought the letter included everything they needed to use the EME website.

The score across all five questions constituted the “broad measure”.

For this analysis, they received 1 for each comprehension question correct and combined their scores. That means that they got: 0 if they got all questions wrong, 1 if they answered one of five correctly, 2 if they answered two of five correctly,
3 if they answered three of five correctly,
4 if they answered four of five correctly, and 5 if they answered all questions correctly.

comp_broad_models <- results %>% 
    mutate(outcome = comp_broad) %>% 
    estimate_models()
    
comp_broad_models %>% 
    print_reg() %>% 
    shorten_table(two_cols = T)
OLS OLS
(Intercept) 3.4 *** 1.9 ***
(0.1)    (0.3)   
Treatment 2 0.0    0.1   
(0.1)    (0.1)   
Treatment 3 0.0    0.0   
(0.1)    (0.1)   
Treatment 4 -0.0    0.0   
(0.1)    (0.1)   
Controls for demographics No       Yes      
Controls for numeracy No       Yes      
N 2,285       2,285      
*** p < 0.001; ** p < 0.01; * p < 0.05; + p < 0.1.

There were no statistically significant effects for the “broad” comprehension measure (i.e., all five questions). This suggests that most of the comprehension impact was on the two key questions of what the letter was saying would happen to energy bills, and what the letter was asking the reader to.

This is consistent with the idea that the headline plays an important role in comprehension - the headline covered information that was relevant to the “narrow” comprehension questions, but did not necessarily convey information about the questions the additional questions that made up the “broad” measure.

results %>%
    mutate(outcome = comp_broad) %>%
    lm(outcome ~ treatment, data = .) %>%
    bar_data() %>%
    mutate(label_text = estimate %>% sprintf("%.1f", .)) %>%
    ggplot(aes(term, estimate, ymax = ymax, ymin = ymin)) +
    bit_bar(-0.4) +
    scale_y_continuous() +
    hide_legend() +
    labs(x = "", y = "")

# 
# save_plot("comp_broad.png")
# 
# results %>%
#     mutate(outcome = comp_broad) %>%
#     lm(outcome ~ treatment, data = .) %>%
#     bar_data() %>%
#     mutate(estimate = estimate / 100,
#            ymax = ymax / 100,
#            ymin = ymin / 100) %>% 
# bit_hchart() %>% 
#         hc_plotOptions(column = list(dataLabels = list(
#         enabled = TRUE,
#         color = "white",
#         inside = T,
#         verticalAlign = "Top",
#         format = "{y:.1f}",
#         style = list(textOutline = "none")
#         ))) %>% 
# hc_yAxis(labels = list(format = "{value:.0f}"),
#          stackLabels = list(enabled = TRUE),
#          title = "",
#          tickInterval = 1,
#          max = 5) %>% 
#     hc_xAxis(title = "") %>% 
#     hc_tooltip(formatter = JS("function () {return '<b> Treatment ' + (this.x + 1) +
#                 '</b>: ' + this.y}"))