1 Background

BIT worked with the AER to conduct an online framed field experiment to test different versions of a proposed benefit change notification. This trial is the third in a series of three that were conducted. 2,099 respondents saw one of three versions of the notification, and were asked about their intended behaviour.

Following trial 2, we learned that some retailers may not be able to express the value of the benefit in terms of a dollar figure. Therefore, we elected to take forward the “lose discount %” arm from trial 2. It did not appear to have statistically significant differences from the “lose discount $” letter, as it preserved the high levels of comprehension and did not appear to vary on the other key outcome measures. Given that we had found limited differences on our outcome measures in preceding trials, we decided to include both the outcome measures for trial 3. The outcome measure from trial 1 (asking whether a person when they would take action, and what they would do) was asked immediately after the letter, while the outcome measure from trial 2 (offering the opportunity to view EME) was asked after the comprehension questions.

We also chose to try only three arms, instead of four, in order to increase the power of our test. The three arms were the “lose discount %” from trial 2, a letter with a “large callout” box just below the headline, and one with a “warning callout” that included the estimated bill for the coming year. (edited)

The key findings included:

Intention to visit EME did not appear to vary by treatment
Comprehension did not appear to vary by treatment, and was consistent with the most effective versions from the previous two trials conducted with AER.

Overall, these results suggest that we my have been observing ceiling effects, whereby we had reached maximum effectiveness for these measures. This document is intended as a supporting document to the final report, giving technical details of the analysis underpinning these findings.

2 Intervention

Below are the three versions of the letter that were trialled.


Loss headline (%)	Call out box


Loss warning

3 Estimation strategy

3.1 Statistical model

Before the experiment was run, a number of analyses were pre-specified, reducing our risk of spurious findings. Below, we have presented the results of all pre-specified analyses (Primary and Secondary analysis).

For each outcome, we have presented two estimates of the impact of each element of the letter that was varied:

A regression using just the characteristics of the letter as a predictor

\[\text{Outcome}_i = \beta_1 \cdot \text{Letter 1}_i + \beta_2 \cdot \text{Letter 2}_i + \beta_3 \cdot \text{Letter 3}_i + \epsilon_i\]

A regression using the random allocation and a set of covariates to improve precision

\[ \begin{aligned} \text{Outcome}_i = &\beta_1 \cdot \text{Letter 1}_i + \beta_2 \cdot \text{Letter 2}_i + \beta_3 \cdot \text{Letter 3}_i + \\ &\beta_4 \cdot \text{income}_i + \beta_5 \cdot \text{education}_i + \beta_6 \cdot \text{numeracy_score}_i +\epsilon_i \end{aligned} \]

Additionally, if the outcome is binary, we have presented the results from both an standard linear regression and the estimated average marginal effect from a logistic regression. All errors reported are heteroskedasticity-consistent ‘robust’ errors.

For all analyses, we present a regression table with the estimates of interest, with standard errors in parentheses, and a bar chart showing the estimates as 4 different treatment groups.

3.2 Covariates

The covariates used in the model are define as follows:

Income - entered as a numeric variable, giving the average value of the income bracket they reported
Education - entered as a categorical variable, with four possible values:
1. “Did not finish high school”
2. “High school graduate”
3. “Undergraduate”
4. “Post-graduate”
Numeracy scores - entered as a numeric variable, based on their performance in the numeracy scores. Ranges from 0 to 4.
Switched provider - categorical variable, which indicates the answer given to “How long have you been with your current energy provider?”, chosen from the options below
- Less than 1 year
- Between 1 and 2 years
- Between 2 - 4 years
- More than 4 years
- Don’t know
Switched plans - a binary variable, indicating whether they’ve ever switched plans with their current provider

4 Balance

Individuals were randomised into seeing one of the four versions of the letter. Below are summary statistics by group, showing differences in between these treatment groups by the covariates used in the full model. We did not observe any meaningful differences in the covariates.

covars <- c("education_alt", "num_score", "energy1", "energy2")

results %>% 
    select(treatment, one_of(covars)) %>% 
    gather(key, value, -treatment) %>%
    mutate(key = case_when(
        .$key == "education_alt" ~ "Education",
        .$key == "num_score" ~ "Numeracy",
        .$key == "energy1" ~ "Switch provider",
        .$key == "energy2" ~ "Switch plan"
    ),
    treatment = str_c("Letter ", treatment)) %>% 
    count(treatment, key, value) %>% 
    filter(key == "Education") %>% 
ggplot(aes(value, n, fill = treatment)) +
    geom_col(position = "dodge") +
    graph_flip() +
    scale_fill_BIT() +
    labs(x = "", title = "Education")

results %>% 
    select(treatment, one_of(covars)) %>% 
    gather(key, value, -treatment) %>%
    mutate(key = case_when(
        .$key == "education_alt" ~ "Education",
        .$key == "num_score" ~ "Numeracy score",
        .$key == "energy1" ~ "Switch provider",
        .$key == "energy2" ~ "Switch plan"
    ),
    treatment = str_c("Letter ", treatment)) %>% 
    count(treatment, key, value) %>% 
    filter(key == "Numeracy score") %>% 
ggplot(aes(fct_rev(value), n, fill = treatment)) +
    geom_col(position = "dodge") +
    graph_flip() +
    scale_fill_BIT() +
    labs(x = "", title = "Numeracy score")

results %>% 
    select(treatment, one_of(covars)) %>% 
    gather(key, value, -treatment) %>%
    mutate(key = case_when(
        .$key == "education_alt" ~ "Education",
        .$key == "num_score" ~ "Numeracy",
        .$key == "energy1" ~ "Switch provider",
        .$key == "energy2" ~ "Switch plan"
    ),
    treatment = str_c("Letter ", treatment)) %>% 
    count(treatment, key, value) %>% 
    filter(key == "Switch provider") %>% 
    mutate(value = fct_relevel(value, "Less than 1 year", "Between 1 and 2 years", "Between 2 and 4 years", "More than 4 years")) %>% 
ggplot(aes(fct_rev(value), n, fill = treatment)) +
    geom_col(position = "dodge") +
    graph_flip() +
    scale_fill_BIT() +
    labs(x = "", title = "'How long have you been with your current provider?'")

results %>% 
    select(treatment, one_of(covars)) %>% 
    gather(key, value, -treatment) %>%
    mutate(key = case_when(
        .$key == "education_alt" ~ "Education",
        .$key == "num_score" ~ "Numeracy",
        .$key == "energy1" ~ "Switch provider",
        .$key == "energy2" ~ "Switch plan"
    ),
    treatment = str_c("Letter ", treatment)) %>% 
    count(treatment, key, value) %>% 
    filter(key == "Switch plan") %>% 
ggplot(aes(value, n, fill = treatment)) +
    geom_col(position = "dodge") +
    graph_flip() +
    scale_fill_BIT() +
    labs(x = "", title = "'Have you ever switched plans with them?'")

results %>% 
    select(treatment, Income = income_mid) %>% 
    gather(key, value, -treatment) %>%
    mutate(treatment = str_c("Letter ", treatment)) %>% 
    count(treatment, key, value) %>% 
ggplot(aes(value, fct_rev(treatment), height = n, scale = 0.05, fill = treatment)) +
    ggridges::geom_ridgeline( colour = "white") +
    scale_x_continuous(breaks = pretty_breaks(10), label = dollar) +
    scale_fill_BIT() +
    labs(x = "", y = "")

5 Descriptives

results %>% 
    select(id, starts_with("difficulty_")) %>% 
    gather(key, val, -id) %>% 
    lm(val ~ key, data = .) %>% 
    bar_data() %>% 
    select(-term, term = old_term) %>% 
        mutate(term = str_replace(term, "keydifficulty_sq00|Treatment ", "")) %>% 
    mutate(
           term = case_when(.$term == "1" ~ "Using a comparison website",
                            .$term == "2" ~ "Using Energy Made Easy",
                            .$term == "3" ~ "Calling my retailer",
                            .$term == "4" ~ "Doing my own research"),
           label_text = estimate %>% sprintf("%.1f", .),
           stars = NA_character_
           ) %>% 
    ggplot(aes(term, estimate, ymax = ymax, ymin = ymin)) +
    bit_bar(-0.3) +
    scale_y_continuous(breaks = pretty_breaks(4)) +
    hide_legend() +
    labs(x = "", y = "")

# 
# save_plot("research.png")
# 
# results %>% 
#     select(id, starts_with("difficulty_")) %>% 
#     gather(key, val, -id) %>% 
#     lm(val ~ key, data = .) %>% 
#     bar_data() %>% 
#     select(-term, term = old_term) %>% 
#     mutate(estimate = estimate / 100,
#            ymax = ymax / 100,
#            ymin = ymin / 100,
#            term = str_replace(term, "keydifficulty_sq00|Treatment ", ""),
#            stars = NA_character_) %>% 
#     mutate(
#            term = case_when(.$term == "1" ~ "Using a comparison website",
#                             .$term == "2" ~ "Using Energy Made Easy",
#                             .$term == "3" ~ "Calling my retailer",
#                             .$term == "4" ~ "Doing my own research")
#            ) %>% 
#     bit_hchart_four() %>% 
#     hc_axis_score(1) %>% 
#     hc_title(text = "How easy do you think it would be to get a better deal through the following methods (1 - 10)", style = list(color = BIT[2], fontWeight = "bold", fontSize = 14))
# 
# 
#

6 Analysis

6.1 Primary analysis

6.1.1 Did the treatment increase the likelihood that consumers would use the EME website?

Measure: Respondents were first asked what they would do on receipt of the letter – take action immediately, within a week, when they had time, or not take action at all. For those that chose any of the options that indicated they would take action, we then asked what action they would take. The options included visiting EME, visiting a non-EME comparison site, calling the retailer, doing research online (not via comparison sites), or something else.

For this analysis, the outcome variable is 1 if they responded that they would go to EME, and 0 otherwise.

EME_models <- results %>% 
    mutate(outcome = choose_EME) %>% 
    estimate_models()
    
EME_models %>% 
    print_reg() %>% 
    shorten_table(percent_table = T)

	OLS	Logistic	OLS	Logistic
(Intercept)	35.1% ***		10.3% *
	(1.8%)		(5.2%)
Treatment 2	0.7%	0.7%	0.5%	0.5%
	(2.5%)	(2.5%)	(2.5%)	(2.5%)
Treatment 3	-0.6%	-0.6%	-1.4%	-1.4%
	(2.6%)	(2.6%)	(2.5%)	(2.5%)
Controls for demographics	No	No	Yes	Yes
Controls for numeracy	No	No	Yes	Yes
N	2,099	2,099	2,099	2,099
* p < 0.001; p < 0.01; * p < 0.05; + p < 0.1.

Based on the trial, the proportion of respondents that said they would take some action and visit EME did not vary between the treatments, and was consistently around 35%. There were no meaningful differences between the letters.

EME_models[[1]] %>%
bar_data() %>%
ggplot(aes(term, estimate, ymax = ymax, ymin = ymin)) +
    bit_bar() +
    scale_y_continuous(label = percent) +
    hide_legend() +
    labs(x = "", y = "")

# save_plot("choose_EME.png")
# 
# EME_models[[1]] %>%
#     bar_data() %>%
#     bit_hchart() %>% 
#     hc_axis_perc()
#

6.1.2 Did the treatment decrease the likelihood that the consumer would do nothing?

Measure: Respondents were first asked what they would do on receipt of the letter – take action immediately, within a week, when they had time, or not take action at all. For those that chose any of the options that indicated they would take action, we then asked what action they would take. The options included visiting EME, visiting a non-EME comparison site, calling the retailer, doing research online (not via comparison sites), or something else.

For this analysis, the outcome variable is 1 if they responded that they would ‘Do nothing’, and 0 otherwise.

do_nothing_models <- results %>% 
    mutate(outcome = do_nothing) %>% 
    estimate_models()
    
do_nothing_models %>% 
    print_reg() %>% 
    shorten_table(percent_table = T)

	OLS	Logistic	OLS	Logistic
(Intercept)	7.1% ***		14.0% ***
	(0.9%)		(3.2%)
Treatment 2	-1.5%	-1.5%	-1.6%	-1.6%
	(1.3%)	(1.3%)	(1.3%)	(1.3%)
Treatment 3	0.6%	0.6%	0.7%	0.7%
	(1.4%)	(1.4%)	(1.4%)	(1.4%)
Controls for demographics	No	No	Yes	Yes
Controls for numeracy	No	No	Yes	Yes
N	2,099	2,099	2,099	2,099
* p < 0.001; p < 0.01; * p < 0.05; + p < 0.1.

As with the main outcome measure, there were no statistically significant differences between the treatments in the proportion who stated they would do nothing. Notably, the proportions appeared to be broadly consistent with trial 1, with between 4-8% of respondents stating that they would do nothing.

do_nothing_models[[1]] %>%
bar_data() %>%
ggplot(aes(term, estimate, ymax = ymax, ymin = ymin)) +
    bit_bar(-0.01) +
    scale_y_continuous(label = percent, lim = c(0,0.10)) +
    hide_legend() +
    labs(x = "", y = "")

# 
# save_plot("do_nothing.png")
# 
# do_nothing_models[[1]] %>%
# bar_data() %>% 
#     bit_hchart(n_y = 0.45) %>% 
#     hc_axis_perc(tickInterval = 5)

6.2 Secondary analysis

6.2.1 Did the treatment increase comprehension of the letter (narrow)?

Measure: Respondents were asked two questions – firstly, what the letter was saying would happen to their energy bills next year (they would pay more because they were losing their discount, they would pay more because prices were rising generally, or they would pay less). Secondly, they were asked what the letter was asking them to do (go to EME, contact their retailer for information to use EME, contact their provider to get a better deal, use a comparison website, or something else).

For this analysis, they received 1 for each comprehension question correct and combined their scores. That means that they got 0 if they got both questions wrong, 1 if they answered one of two correctly, and 2 if they answered all questions correctly.

comp_narrow_models <- results %>% 
    mutate(outcome = comp_narrow) %>% 
    estimate_models()
    
comp_narrow_models %>% 
    print_reg() %>% 
    shorten_table(two_cols = T)

	OLS	OLS
(Intercept)	1.6 ***	0.9 ***
	(0.0)	(0.1)
Treatment 2	0.0	0.0
	(0.0)	(0.0)
Treatment 3	-0.0	-0.1 +
	(0.0)	(0.0)
Controls for demographics	No	Yes
Controls for numeracy	No	Yes
N	2,099	2,099
* p < 0.001; p < 0.01; * p < 0.05; + p < 0.1.

In trial 3, there were no significant differences in comprehension across the three treatments. The level of comprehension was consistent with the best performing treatments from trials 1 and 2. It suggests we may be observing a “ceiling effect”, whereby we have optimised the current format and headline as far as possible, and further gains would require more substantial changes.

results %>%
    mutate(outcome = comp_narrow) %>%
    lm(outcome ~ treatment, data = .) %>%
    bar_data() %>%
    mutate(label_text = estimate %>% sprintf("%.1f", .)) %>%
    ggplot(aes(term, estimate, ymax = ymax, ymin = ymin)) +
    bit_bar(-0.1) +
    scale_y_continuous() +
    hide_legend() +
    labs(x = "", y = "")

# 
# save_plot("comp_narrow.png")
# 
# 
# 
# results %>%
#     mutate(outcome = comp_narrow) %>%
#     lm(outcome ~ treatment, data = .) %>%
#     bar_data() %>%
#     mutate(estimate = estimate / 100,
#            ymax = ymax / 100,
#            ymin = ymin / 100) %>% 
# bit_hchart(n_y = 0.04) %>% 
#     hc_plotOptions(column = list(dataLabels = list(
#         enabled = TRUE,
#         color = "white",
#         inside = T,
#         verticalAlign = "Top",
#         format = "{y:.1f}",
#         style = list(textOutline = "none")
#         ))) %>% 
# hc_yAxis(labels = list(format = "{value:.1f}"),
#          stackLabels = list(enabled = TRUE),
#          title = "",
#              tickInterval = 0.5) %>% 
#     hc_xAxis(title = "") %>% 
#     hc_tooltip(formatter = JS("function () {return '<b> Treatment ' + (this.x + 1) +
#                 '</b>: ' + this.y}"))

6.2.2 Did the treatment increase comprehension of the letter (broad)?

Measure: Respondents were asked five comprehension questions. The first two made up the “narrow measure” and were the same as the trial 1 (see above). The next three were additional questions, added to further explore comprehension. They covered who ran the EME website (government vs commercial company vs a retailer), what the consumer got from the EME website, and whether they thought the letter included everything they needed to use the EME website. The score across all five questions constituted the “broad measure”.

For this analysis, they received 1 for each comprehension question correct and combined their scores. That means that they got: 0 if they got all questions wrong, 1 if they answered one of five correctly, 2 if they answered two of five correctly,
3 if they answered three of five correctly,
4 if they answered four of five correctly, and 5 if they answered all questions correctly.

comp_broad_models <- results %>% 
    mutate(outcome = comp_broad) %>% 
    estimate_models()
    
comp_broad_models %>% 
    print_reg() %>% 
    shorten_table(two_cols = T)

	OLS	OLS
(Intercept)	3.6 ***	1.7 ***
	(0.1)	(0.2)
Treatment 2	0.0	0.0
	(0.1)	(0.1)
Treatment 3	-0.1	-0.1
	(0.1)	(0.1)
Controls for demographics	No	Yes
Controls for numeracy	No	Yes
N	2,099	2,099
* p < 0.001; p < 0.01; * p < 0.05; + p < 0.1.

There were no statistically significant effects for the “broad” comprehension measure (i.e., all five questions, including new questions added for trials 2 and 3). This suggests that most of the comprehension impact was on the two key questions of what the letter was saying would happen to energy bills, and what the letter was asking the reader to.

This is consistent with the idea that the headline plays an important role in comprehension - the headline covered information that was relevant to the “narrow” comprehension questions, but did not necessarily convey information about the questions the additional questions that made up the “broad” measure.

results %>%
    mutate(outcome = comp_broad) %>%
    lm(outcome ~ treatment, data = .) %>%
    bar_data() %>%
    mutate(label_text = estimate %>% sprintf("%.1f", .)) %>%
    ggplot(aes(term, estimate, ymax = ymax, ymin = ymin)) +
    bit_bar(-0.3) +
    scale_y_continuous() +
    hide_legend() +
    labs(x = "", y = "")

# 
# save_plot("comp_broad.png")
# 
# results %>%
#     mutate(outcome = comp_broad) %>%
#     lm(outcome ~ treatment, data = .) %>%
#     bar_data() %>%
#     mutate(estimate = estimate / 100,
#            ymax = ymax / 100,
#            ymin = ymin / 100) %>% 
# bit_hchart() %>% 
#         hc_plotOptions(column = list(dataLabels = list(
#         enabled = TRUE,
#         color = "white",
#         inside = T,
#         verticalAlign = "Top",
#         format = "{y:.1f}",
#         style = list(textOutline = "none")
#         ))) %>% 
# hc_yAxis(labels = list(format = "{value:.0f}"),
#          stackLabels = list(enabled = TRUE),
#          title = "",
#          tickInterval = 1,
#          max = 5) %>% 
#     hc_xAxis(title = "") %>% 
#     hc_tooltip(formatter = JS("function () {return '<b> Treatment ' + (this.x + 1) +
#                 '</b>: ' + this.y}"))

Technical Appendix: Review of Benefit Change Notice Trial 3

The Behavioural Insights Team