Data and Packages

data <- read.csv("https://raw.githubusercontent.com/jefftwebb/data/main/offline_marketing_360.csv")

dictionary <- read.csv("https://raw.githubusercontent.com/jefftwebb/data/main/offline_marketing_360_data_dictionary.csv")
str(data)
## 'data.frame':    128 obs. of  5 variables:
##  $ date     : chr  "2021-07-01" "2021-07-02" "2021-07-03" "2021-07-04" ...
##  $ city     : chr  "Denver" "Denver" "Denver" "Denver" ...
##  $ treated  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ downloads: int  16 18 18 18 17 19 19 18 21 20 ...
##  $ post     : int  0 0 0 0 0 0 0 0 0 0 ...

1) Plot the time series data.

# Convert the date column to date type 
data$date <- as.Date(data$date)

# Create the time series plot
ggplot(data, aes(x = date, y = downloads, color = city)) +
  geom_line() +
  labs(title = "Time Series of Downloads", x = "Date", y = "Number of Downloads") +
  theme_minimal()

Interpretation

The time series plot shows the number of downloads over time for four cities: Denver, Los Angeles, San Francisco, and Seattle.

2) Use DiD to estimate the treatment effect in Denver.

# Run the DiD model using lm()
did_model <- lm(downloads ~ treated * post, data = data)

# Show the model summary
summary(did_model)
## 
## Call:
## lm(formula = downloads ~ treated * post, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4048 -3.8333 -0.8452  4.1667  8.5952 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   33.4048     0.6719  49.715  < 2e-16 ***
## treated      -14.5476     1.3439 -10.825  < 2e-16 ***
## post           2.4286     0.8959   2.711  0.00766 ** 
## treated:post   5.3810     1.7918   3.003  0.00323 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.355 on 124 degrees of freedom
## Multiple R-squared:  0.6181, Adjusted R-squared:  0.6088 
## F-statistic: 66.89 on 3 and 124 DF,  p-value: < 2.2e-16
# Extract and print the DiD estimate (coefficient for 'treated:post')
ate_did <- round(coef(did_model)["treated:post"], 4)

# Print the ATE using cat() with rounding
cat("The ATE from the DiD model is:", ate_did, "\n")
## The ATE from the DiD model is: 5.381

Interpretation

After the 360 marketing campaign in Denver, there was an average increase of 5.38 downloads due to the treatment. This is significant, as indicated by the p-value (0.00323), suggesting a strong treatment effect.

3) Estimate the treatment effect in Denver with CausalImpact. Do not use any covariate series for this analysis.

range(as.Date(data$date))
## [1] "2021-07-01" "2021-08-01"
# Filter data for Denver only
denver_data <- data %>% filter(city == "Denver")

# Ensure the date is in Date format
denver_data$date <- as.Date(denver_data$date)

# Define pre-treatment and post-treatment periods
pre_period <- as.Date(c("2021-07-01", "2021-07-18")) 
post_period <- as.Date(c("2021-07-19", "2021-08-01"))

# Prepare the time series data for CausalImpact 
time_series <- denver_data %>% select(date, downloads)

# Apply CausalImpact to estimate the treatment effect
impact <- CausalImpact(time_series, pre_period, post_period)

# Summary of the results
summary(impact)
## Posterior inference {CausalImpact}
## 
##                          Average      Cumulative   
## Actual                   28           388          
## Prediction (s.d.)        20 (0.8)     275 (11.2)   
## 95% CI                   [18, 21]     [251, 295]   
##                                                    
## Absolute effect (s.d.)   8.1 (0.8)    113.2 (11.2) 
## 95% CI                   [6.7, 9.8]   [93.2, 136.5]
##                                                    
## Relative effect (s.d.)   42% (5.8%)   42% (5.8%)   
## 95% CI                   [32%, 54%]   [32%, 54%]   
## 
## Posterior tail-area probability p:   0.001
## Posterior prob. of a causal effect:  99.8997%
## 
## For more details, type: summary(impact, "report")
# Plot the results
plot(impact)

# Extract ate
ate <- impact$summary$AbsEffect[2] / length(post_period[1]:post_period[2]) 
cat("The Average Treatment Effect (ATE) is:", round(ate, 4), "\n")
## The Average Treatment Effect (ATE) is: 8.0867

Interpretation

Average Effect: The actual average number of downloads during the post-treatment period was 28, while the predicted downloads (what would have happened without the marketing campaign) were 20. This gives an average absolute effect of 8.1 downloads per day.

Cumulative Effect: Over the entire post-treatment period, the marketing campaign generated an additional 113.2 downloads (cumulative absolute effect), compared to the predicted total downloads of 275 without the treatment.

Relative Effect: The treatment resulted in a 42% increase in downloads compared to what was expected without the campaign.

Statistical Significance: The posterior probability of the campaign having a true causal effect is very high, nearly 100% (99.8997%), and the p-value is 0.001, which means the effect is highly statistically significant.

Visualization:

  1. The top graph shows the actual (black line) vs. predicted (blue dashed line) downloads over time. After the intervention (vertical dashed line), actual downloads increase significantly compared to the prediction.

  2. The middle graph shows the pointwise impact (the daily differences between actual and predicted downloads).

  3. The bottom graph shows the cumulative effect, which grows steadily over the post-treatment period.

In summary, the marketing campaign had a significant positive impact, increasing downloads by an average of 8.1 per day and 42% overall during the post-treatment period.

4) Estimate treatment effect in Denver with CausalImpact using downloads from the control cities—San Francisco, Seattle and Los Angeles—as covariate time series. How does the estimate change?

# Filter the data for Denver and the control cities
control_data <- data %>% filter(city %in% c("San Francisco", "Seattle", "Los Angeles"))
denver_data <- data %>% filter(city == "Denver")

# Reshape control data into a wide format to use as covariates
control_wide <- control_data %>%
  select(date, city, downloads) %>%
  spread(city, downloads)

# denvers data 
time_series <- denver_data %>%
  select(date, downloads) %>%
  left_join(control_wide, by = "date")

# Convert date to numeric index 
time_series_matrix <- time_series %>%
  mutate(time_index = 1:n()) %>%
  select(-date)

# Define pre-treatment and post-treatment periods based on numeric index
pre_period <- c(1, 18)  
post_period <- c(19, 31)  

# Run CausalImpact with the covariates
impact_with_covariates <- CausalImpact(time_series_matrix, pre_period, post_period)

# Summary of the results
summary(impact_with_covariates)
## Posterior inference {CausalImpact}
## 
##                          Average       Cumulative  
## Actual                   28            359         
## Prediction (s.d.)        25 (0.9)      329 (11.6)  
## 95% CI                   [23, 27]      [304, 350]  
##                                                    
## Absolute effect (s.d.)   2.3 (0.9)     29.9 (11.6) 
## 95% CI                   [0.66, 4.2]   [8.57, 55.2]
##                                                    
## Relative effect (s.d.)   9.2% (3.9%)   9.2% (3.9%) 
## 95% CI                   [2.4%, 18%]   [2.4%, 18%] 
## 
## Posterior tail-area probability p:   0.00503
## Posterior prob. of a causal effect:  99.49698%
## 
## For more details, type: summary(impact, "report")
# Plot the results
plot(impact_with_covariates)

# Extract ATE
ate_with_covariates <- impact_with_covariates$summary$AbsEffect[2]
cat("The Average Treatment Effect (ATE) with covariates is:", round(ate_with_covariates, 4), "\n")
## The Average Treatment Effect (ATE) with covariates is: 29.9488

With the control cities’ downloads (San Francisco, Seattle, and Los Angeles) included as covariates, the results of the CausalImpact analysis change compared to the analysis without covariates.

Average Effect:

Cumulative Effect:

Relative Effect:

Statistical Significance:

Plots

Top Plot (Original):

Middle Plot (Pointwise):

Bottom Plot (Cumulative):

Conclusion:

Including the covariates from the control cities reduced the estimated treatment effect of the campaign. This suggests that some of the increase in downloads originally attributed to the campaign might have been influenced by broader trends observed in other cities, rather than the campaign alone. The Average Treatment Effect (ATE) with covariates is 2.3 downloads/day, whereas it was 8.1 downloads/day without covariates.

5) Fit a GeoLift model (using the GeoLift() function) and output the model summary. How does the results differ from those obtained with CausalImpact?

read in data

geolift_data <- data %>%
  filter(city %in% c("Denver", "San Francisco", "Seattle", "Los Angeles")) %>%
  mutate(treatment = ifelse(city == "Denver" & date >= "2021-07-19", 1, 0))  

# 'time_id', 'location', 'Y', and 'treatment'
geolift_data <- geolift_data %>%
  rename(time_id = date, location = city, Y = downloads) %>%
  select(time_id, location, Y, treatment)

# Ensure that the time is in Date format
geolift_data$time_id <- as.Date(geolift_data$time_id)

# Inspect the structure of geolift_data to verify
str(geolift_data)
## 'data.frame':    128 obs. of  4 variables:
##  $ time_id  : Date, format: "2021-07-01" "2021-07-02" ...
##  $ location : chr  "Denver" "Denver" "Denver" "Denver" ...
##  $ Y        : int  16 18 18 18 17 19 19 18 21 20 ...
##  $ treatment: num  0 0 0 0 0 0 0 0 0 0 ...
# Read and format the data
geo_data <- GeoDataRead(data = geolift_data,
                        date_id = "time_id",      # Date column
                        location_id = "location", # Location column (cities)
                        Y_id = "Y",               # Outcome variable (downloads)
                        X = c(),                  # No covariates
                        format = "yyyy-mm-dd",    # Date format
                        summary = TRUE)

# Display the data summary using gt for a clean table view
geo_data |> gt()
location time Y
denver 1 16
denver 2 18
denver 3 18
denver 4 18
denver 5 17
denver 6 19
denver 7 19
denver 8 18
denver 9 21
denver 10 20
denver 11 21
denver 12 19
denver 13 20
denver 14 20
denver 15 21
denver 16 22
denver 17 24
denver 18 25
denver 19 25
denver 20 27
denver 21 25
denver 22 26
denver 23 26
denver 24 30
denver 25 27
denver 26 27
denver 27 28
denver 28 28
denver 29 30
denver 30 29
denver 31 31
denver 32 29
los angeles 1 30
los angeles 2 30
los angeles 3 32
los angeles 4 29
los angeles 5 32
los angeles 6 33
los angeles 7 31
los angeles 8 32
los angeles 9 31
los angeles 10 33
los angeles 11 33
los angeles 12 32
los angeles 13 32
los angeles 14 32
los angeles 15 32
los angeles 16 32
los angeles 17 34
los angeles 18 34
los angeles 19 32
los angeles 20 34
los angeles 21 35
los angeles 22 35
los angeles 23 33
los angeles 24 35
los angeles 25 35
los angeles 26 35
los angeles 27 34
los angeles 28 34
los angeles 29 35
los angeles 30 35
los angeles 31 37
los angeles 32 37
san francisco 1 27
san francisco 2 28
san francisco 3 28
san francisco 4 26
san francisco 5 28
san francisco 6 29
san francisco 7 28
san francisco 8 29
san francisco 9 30
san francisco 10 29
san francisco 11 29
san francisco 12 28
san francisco 13 29
san francisco 14 29
san francisco 15 29
san francisco 16 29
san francisco 17 32
san francisco 18 32
san francisco 19 30
san francisco 20 31
san francisco 21 32
san francisco 22 31
san francisco 23 31
san francisco 24 31
san francisco 25 31
san francisco 26 32
san francisco 27 31
san francisco 28 32
san francisco 29 33
san francisco 30 33
san francisco 31 33
san francisco 32 33
seattle 1 40
seattle 2 39
seattle 3 40
seattle 4 40
seattle 5 39
seattle 6 42
seattle 7 41
seattle 8 40
seattle 9 40
seattle 10 41
seattle 11 41
seattle 12 41
seattle 13 40
seattle 14 40
seattle 15 41
seattle 16 40
seattle 17 43
seattle 18 41
seattle 19 42
seattle 20 43
seattle 21 41
seattle 22 40
seattle 23 42
seattle 24 41
seattle 25 42
seattle 26 43
seattle 27 42
seattle 28 42
seattle 29 42
seattle 30 42
seattle 31 41
seattle 32 43

Model

# Check the structure of geo_data
str(geo_data)
## 'data.frame':    128 obs. of  3 variables:
##  $ location: chr  "denver" "denver" "denver" "denver" ...
##  $ time    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Y       : int  16 18 18 18 17 19 19 18 21 20 ...
# Ensure Y is numeric
geo_data <- geo_data %>%
  mutate(Y = as.numeric(Y))  # Ensure Y is numeric, no need to modify time

# Check structure again after conversion
str(geo_data)
## 'data.frame':    128 obs. of  3 variables:
##  $ location: chr  "denver" "denver" "denver" "denver" ...
##  $ time    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Y       : num  16 18 18 18 17 19 19 18 21 20 ...
# Fit the GeoLift model
gl_model <- GeoLift(
  Y_id = "Y",                      # Outcome variable (downloads)
  locations = c("denver"),          # Treatment location (denver)
  treatment_start_time = 19,        # Start of the treatment period (index 19)
  treatment_end_time = 31,          # End of the treatment period (index 31)
  data = geo_data,                  # The formatted dataset
  alpha = 0.05,                     # Confidence level (95%)
  stat_test = "Positive")           # Test for positive impact

# Output the model summary
summary(gl_model)

Average Treatment Effect:

  • GeoLift estimated an average effect (ATT) of 5.056 downloads per day.
  • CausalImpact estimated a higher effect of 8.1 downloads per day.

Cumulative/Incremental Downloads:

  • GeoLift estimated a total of 66 additional downloads due to the treatment.
  • CausalImpact estimated a higher cumulative effect of 113.2 downloads.

Percent Lift:

  • GeoLift calculated a 22.4% increase in downloads due to the treatment.
  • CausalImpact estimated a larger lift of 42%.

Statistical Significance:

  • Both models found strong evidence of a significant treatment effect. GeoLift had a p-value of 0, and CausalImpact had a 99.9% posterior probability of a causal effect.

Conclusion:

While both models found a significant positive impact from the treatment, CausalImpact estimated a larger effect on downloads compared to GeoLift, which provided a more conservative estimate.

6) Plot the geolift model

# Plot the treatment effects from the GeoLift model
plot(gl_model, type = "ATT")

Interpretation

Pre-Treatment Period (left of the dashed line):

Post-Treatment Period (right of the dashed line):

Shaded Area:

Conclusion: The treatment had a positive and statistically significant impact on downloads in Denver, with the effect stabilizing after an initial increase.

7) Write up the results for FitLife’s single city offline marketing intervention.

Using the GeoLift model, the treatment effect on Denver was estimated over a specified post-treatment period. The key results are as follows:

The GeoLift model suggests that the intervention produced a moderate and statistically significant positive effect, with a 22.4% increase in downloads attributed to the offline marketing campaign.

CausalImpact Model Results

The CausalImpact model was applied to estimate the treatment effect by constructing a counterfactual for Denver based on pre-treatment trends. The findings from CausalImpact are as follows:

CausalImpact provided a higher estimate of the treatment effect, indicating a stronger impact of the offline campaign compared to GeoLift’s more conservative results.

3. Comparison Between Models

Metric GeoLift CausalImpact
Average ATT/ATE 5.056 downloads 8.1 downloads
Cumulative Effect 66 downloads 113.2 downloads
Percent Lift 22.4% 42%
P-value (GeoLift) 0
Posterior Probability (CausalImpact) 99.9%

Conclusion

FitLife’s single-city offline marketing campaign in Denver had a positive and significant impact on product downloads, with both GeoLift and CausalImpact models showing increased downloads during the treatment period. While the CausalImpact model suggests a larger effect, the GeoLift model offers a more conservative estimate. Overall, the campaign was successful, driving up downloads and proving the efficacy of the 360 marketing strategy.

Additional displays detected. Disconnect additional displays to proceed.


Your proctor is connecting to a voice chat with you...