data <- read.csv("https://raw.githubusercontent.com/jefftwebb/data/main/offline_marketing_360.csv")
dictionary <- read.csv("https://raw.githubusercontent.com/jefftwebb/data/main/offline_marketing_360_data_dictionary.csv")
str(data)
## 'data.frame': 128 obs. of 5 variables:
## $ date : chr "2021-07-01" "2021-07-02" "2021-07-03" "2021-07-04" ...
## $ city : chr "Denver" "Denver" "Denver" "Denver" ...
## $ treated : int 1 1 1 1 1 1 1 1 1 1 ...
## $ downloads: int 16 18 18 18 17 19 19 18 21 20 ...
## $ post : int 0 0 0 0 0 0 0 0 0 0 ...
# Convert the date column to date type
data$date <- as.Date(data$date)
# Create the time series plot
ggplot(data, aes(x = date, y = downloads, color = city)) +
geom_line() +
labs(title = "Time Series of Downloads", x = "Date", y = "Number of Downloads") +
theme_minimal()
Interpretation
The time series plot shows the number of downloads over time for four cities: Denver, Los Angeles, San Francisco, and Seattle.
# Run the DiD model using lm()
did_model <- lm(downloads ~ treated * post, data = data)
# Show the model summary
summary(did_model)
##
## Call:
## lm(formula = downloads ~ treated * post, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4048 -3.8333 -0.8452 4.1667 8.5952
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.4048 0.6719 49.715 < 2e-16 ***
## treated -14.5476 1.3439 -10.825 < 2e-16 ***
## post 2.4286 0.8959 2.711 0.00766 **
## treated:post 5.3810 1.7918 3.003 0.00323 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.355 on 124 degrees of freedom
## Multiple R-squared: 0.6181, Adjusted R-squared: 0.6088
## F-statistic: 66.89 on 3 and 124 DF, p-value: < 2.2e-16
# Extract and print the DiD estimate (coefficient for 'treated:post')
ate_did <- round(coef(did_model)["treated:post"], 4)
# Print the ATE using cat() with rounding
cat("The ATE from the DiD model is:", ate_did, "\n")
## The ATE from the DiD model is: 5.381
Interpretation
After the 360 marketing campaign in Denver, there was an average increase of 5.38 downloads due to the treatment. This is significant, as indicated by the p-value (0.00323), suggesting a strong treatment effect.
range(as.Date(data$date))
## [1] "2021-07-01" "2021-08-01"
# Filter data for Denver only
denver_data <- data %>% filter(city == "Denver")
# Ensure the date is in Date format
denver_data$date <- as.Date(denver_data$date)
# Define pre-treatment and post-treatment periods
pre_period <- as.Date(c("2021-07-01", "2021-07-18"))
post_period <- as.Date(c("2021-07-19", "2021-08-01"))
# Prepare the time series data for CausalImpact
time_series <- denver_data %>% select(date, downloads)
# Apply CausalImpact to estimate the treatment effect
impact <- CausalImpact(time_series, pre_period, post_period)
# Summary of the results
summary(impact)
## Posterior inference {CausalImpact}
##
## Average Cumulative
## Actual 28 388
## Prediction (s.d.) 20 (0.8) 275 (11.2)
## 95% CI [18, 21] [251, 295]
##
## Absolute effect (s.d.) 8.1 (0.8) 113.2 (11.2)
## 95% CI [6.7, 9.8] [93.2, 136.5]
##
## Relative effect (s.d.) 42% (5.8%) 42% (5.8%)
## 95% CI [32%, 54%] [32%, 54%]
##
## Posterior tail-area probability p: 0.001
## Posterior prob. of a causal effect: 99.8997%
##
## For more details, type: summary(impact, "report")
# Plot the results
plot(impact)
# Extract ate
ate <- impact$summary$AbsEffect[2] / length(post_period[1]:post_period[2])
cat("The Average Treatment Effect (ATE) is:", round(ate, 4), "\n")
## The Average Treatment Effect (ATE) is: 8.0867
Interpretation
Average Effect: The actual average number of downloads during the post-treatment period was 28, while the predicted downloads (what would have happened without the marketing campaign) were 20. This gives an average absolute effect of 8.1 downloads per day.
Cumulative Effect: Over the entire post-treatment period, the marketing campaign generated an additional 113.2 downloads (cumulative absolute effect), compared to the predicted total downloads of 275 without the treatment.
Relative Effect: The treatment resulted in a 42% increase in downloads compared to what was expected without the campaign.
Statistical Significance: The posterior probability of the campaign having a true causal effect is very high, nearly 100% (99.8997%), and the p-value is 0.001, which means the effect is highly statistically significant.
Visualization:
The top graph shows the actual (black line) vs. predicted (blue dashed line) downloads over time. After the intervention (vertical dashed line), actual downloads increase significantly compared to the prediction.
The middle graph shows the pointwise impact (the daily differences between actual and predicted downloads).
The bottom graph shows the cumulative effect, which grows steadily over the post-treatment period.
In summary, the marketing campaign had a significant positive impact, increasing downloads by an average of 8.1 per day and 42% overall during the post-treatment period.
# Filter the data for Denver and the control cities
control_data <- data %>% filter(city %in% c("San Francisco", "Seattle", "Los Angeles"))
denver_data <- data %>% filter(city == "Denver")
# Reshape control data into a wide format to use as covariates
control_wide <- control_data %>%
select(date, city, downloads) %>%
spread(city, downloads)
# denvers data
time_series <- denver_data %>%
select(date, downloads) %>%
left_join(control_wide, by = "date")
# Convert date to numeric index
time_series_matrix <- time_series %>%
mutate(time_index = 1:n()) %>%
select(-date)
# Define pre-treatment and post-treatment periods based on numeric index
pre_period <- c(1, 18)
post_period <- c(19, 31)
# Run CausalImpact with the covariates
impact_with_covariates <- CausalImpact(time_series_matrix, pre_period, post_period)
# Summary of the results
summary(impact_with_covariates)
## Posterior inference {CausalImpact}
##
## Average Cumulative
## Actual 28 359
## Prediction (s.d.) 25 (0.9) 329 (11.6)
## 95% CI [23, 27] [304, 350]
##
## Absolute effect (s.d.) 2.3 (0.9) 29.9 (11.6)
## 95% CI [0.66, 4.2] [8.57, 55.2]
##
## Relative effect (s.d.) 9.2% (3.9%) 9.2% (3.9%)
## 95% CI [2.4%, 18%] [2.4%, 18%]
##
## Posterior tail-area probability p: 0.00503
## Posterior prob. of a causal effect: 99.49698%
##
## For more details, type: summary(impact, "report")
# Plot the results
plot(impact_with_covariates)
# Extract ATE
ate_with_covariates <- impact_with_covariates$summary$AbsEffect[2]
cat("The Average Treatment Effect (ATE) with covariates is:", round(ate_with_covariates, 4), "\n")
## The Average Treatment Effect (ATE) with covariates is: 29.9488
With the control cities’ downloads (San Francisco, Seattle, and Los Angeles) included as covariates, the results of the CausalImpact analysis change compared to the analysis without covariates.
Average Effect:
Cumulative Effect:
Relative Effect:
Statistical Significance:
Plots
Top Plot (Original):
Middle Plot (Pointwise):
Bottom Plot (Cumulative):
Conclusion:
Including the covariates from the control cities reduced the estimated treatment effect of the campaign. This suggests that some of the increase in downloads originally attributed to the campaign might have been influenced by broader trends observed in other cities, rather than the campaign alone. The Average Treatment Effect (ATE) with covariates is 2.3 downloads/day, whereas it was 8.1 downloads/day without covariates.
geolift_data <- data %>%
filter(city %in% c("Denver", "San Francisco", "Seattle", "Los Angeles")) %>%
mutate(treatment = ifelse(city == "Denver" & date >= "2021-07-19", 1, 0))
# 'time_id', 'location', 'Y', and 'treatment'
geolift_data <- geolift_data %>%
rename(time_id = date, location = city, Y = downloads) %>%
select(time_id, location, Y, treatment)
# Ensure that the time is in Date format
geolift_data$time_id <- as.Date(geolift_data$time_id)
# Inspect the structure of geolift_data to verify
str(geolift_data)
## 'data.frame': 128 obs. of 4 variables:
## $ time_id : Date, format: "2021-07-01" "2021-07-02" ...
## $ location : chr "Denver" "Denver" "Denver" "Denver" ...
## $ Y : int 16 18 18 18 17 19 19 18 21 20 ...
## $ treatment: num 0 0 0 0 0 0 0 0 0 0 ...
# Read and format the data
geo_data <- GeoDataRead(data = geolift_data,
date_id = "time_id", # Date column
location_id = "location", # Location column (cities)
Y_id = "Y", # Outcome variable (downloads)
X = c(), # No covariates
format = "yyyy-mm-dd", # Date format
summary = TRUE)
# Display the data summary using gt for a clean table view
geo_data |> gt()
location | time | Y |
---|---|---|
denver | 1 | 16 |
denver | 2 | 18 |
denver | 3 | 18 |
denver | 4 | 18 |
denver | 5 | 17 |
denver | 6 | 19 |
denver | 7 | 19 |
denver | 8 | 18 |
denver | 9 | 21 |
denver | 10 | 20 |
denver | 11 | 21 |
denver | 12 | 19 |
denver | 13 | 20 |
denver | 14 | 20 |
denver | 15 | 21 |
denver | 16 | 22 |
denver | 17 | 24 |
denver | 18 | 25 |
denver | 19 | 25 |
denver | 20 | 27 |
denver | 21 | 25 |
denver | 22 | 26 |
denver | 23 | 26 |
denver | 24 | 30 |
denver | 25 | 27 |
denver | 26 | 27 |
denver | 27 | 28 |
denver | 28 | 28 |
denver | 29 | 30 |
denver | 30 | 29 |
denver | 31 | 31 |
denver | 32 | 29 |
los angeles | 1 | 30 |
los angeles | 2 | 30 |
los angeles | 3 | 32 |
los angeles | 4 | 29 |
los angeles | 5 | 32 |
los angeles | 6 | 33 |
los angeles | 7 | 31 |
los angeles | 8 | 32 |
los angeles | 9 | 31 |
los angeles | 10 | 33 |
los angeles | 11 | 33 |
los angeles | 12 | 32 |
los angeles | 13 | 32 |
los angeles | 14 | 32 |
los angeles | 15 | 32 |
los angeles | 16 | 32 |
los angeles | 17 | 34 |
los angeles | 18 | 34 |
los angeles | 19 | 32 |
los angeles | 20 | 34 |
los angeles | 21 | 35 |
los angeles | 22 | 35 |
los angeles | 23 | 33 |
los angeles | 24 | 35 |
los angeles | 25 | 35 |
los angeles | 26 | 35 |
los angeles | 27 | 34 |
los angeles | 28 | 34 |
los angeles | 29 | 35 |
los angeles | 30 | 35 |
los angeles | 31 | 37 |
los angeles | 32 | 37 |
san francisco | 1 | 27 |
san francisco | 2 | 28 |
san francisco | 3 | 28 |
san francisco | 4 | 26 |
san francisco | 5 | 28 |
san francisco | 6 | 29 |
san francisco | 7 | 28 |
san francisco | 8 | 29 |
san francisco | 9 | 30 |
san francisco | 10 | 29 |
san francisco | 11 | 29 |
san francisco | 12 | 28 |
san francisco | 13 | 29 |
san francisco | 14 | 29 |
san francisco | 15 | 29 |
san francisco | 16 | 29 |
san francisco | 17 | 32 |
san francisco | 18 | 32 |
san francisco | 19 | 30 |
san francisco | 20 | 31 |
san francisco | 21 | 32 |
san francisco | 22 | 31 |
san francisco | 23 | 31 |
san francisco | 24 | 31 |
san francisco | 25 | 31 |
san francisco | 26 | 32 |
san francisco | 27 | 31 |
san francisco | 28 | 32 |
san francisco | 29 | 33 |
san francisco | 30 | 33 |
san francisco | 31 | 33 |
san francisco | 32 | 33 |
seattle | 1 | 40 |
seattle | 2 | 39 |
seattle | 3 | 40 |
seattle | 4 | 40 |
seattle | 5 | 39 |
seattle | 6 | 42 |
seattle | 7 | 41 |
seattle | 8 | 40 |
seattle | 9 | 40 |
seattle | 10 | 41 |
seattle | 11 | 41 |
seattle | 12 | 41 |
seattle | 13 | 40 |
seattle | 14 | 40 |
seattle | 15 | 41 |
seattle | 16 | 40 |
seattle | 17 | 43 |
seattle | 18 | 41 |
seattle | 19 | 42 |
seattle | 20 | 43 |
seattle | 21 | 41 |
seattle | 22 | 40 |
seattle | 23 | 42 |
seattle | 24 | 41 |
seattle | 25 | 42 |
seattle | 26 | 43 |
seattle | 27 | 42 |
seattle | 28 | 42 |
seattle | 29 | 42 |
seattle | 30 | 42 |
seattle | 31 | 41 |
seattle | 32 | 43 |
# Check the structure of geo_data
str(geo_data)
## 'data.frame': 128 obs. of 3 variables:
## $ location: chr "denver" "denver" "denver" "denver" ...
## $ time : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Y : int 16 18 18 18 17 19 19 18 21 20 ...
# Ensure Y is numeric
geo_data <- geo_data %>%
mutate(Y = as.numeric(Y)) # Ensure Y is numeric, no need to modify time
# Check structure again after conversion
str(geo_data)
## 'data.frame': 128 obs. of 3 variables:
## $ location: chr "denver" "denver" "denver" "denver" ...
## $ time : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Y : num 16 18 18 18 17 19 19 18 21 20 ...
# Fit the GeoLift model
gl_model <- GeoLift(
Y_id = "Y", # Outcome variable (downloads)
locations = c("denver"), # Treatment location (denver)
treatment_start_time = 19, # Start of the treatment period (index 19)
treatment_end_time = 31, # End of the treatment period (index 31)
data = geo_data, # The formatted dataset
alpha = 0.05, # Confidence level (95%)
stat_test = "Positive") # Test for positive impact
# Output the model summary
summary(gl_model)
Average Treatment Effect:
Cumulative/Incremental Downloads:
Percent Lift:
Statistical Significance:
Conclusion:
While both models found a significant positive impact from the treatment, CausalImpact estimated a larger effect on downloads compared to GeoLift, which provided a more conservative estimate.
# Plot the treatment effects from the GeoLift model
plot(gl_model, type = "ATT")
Interpretation
Pre-Treatment Period (left of the dashed line):
Post-Treatment Period (right of the dashed line):
Shaded Area:
Conclusion: The treatment had a positive and statistically significant impact on downloads in Denver, with the effect stabilizing after an initial increase.
Using the GeoLift model, the treatment effect on Denver was estimated over a specified post-treatment period. The key results are as follows:
The GeoLift model suggests that the intervention produced a moderate and statistically significant positive effect, with a 22.4% increase in downloads attributed to the offline marketing campaign.
CausalImpact Model Results
The CausalImpact model was applied to estimate the treatment effect by constructing a counterfactual for Denver based on pre-treatment trends. The findings from CausalImpact are as follows:
CausalImpact provided a higher estimate of the treatment effect, indicating a stronger impact of the offline campaign compared to GeoLift’s more conservative results.
3. Comparison Between Models
Metric | GeoLift | CausalImpact |
---|---|---|
Average ATT/ATE | 5.056 downloads | 8.1 downloads |
Cumulative Effect | 66 downloads | 113.2 downloads |
Percent Lift | 22.4% | 42% |
P-value (GeoLift) | 0 | — |
Posterior Probability (CausalImpact) | — | 99.9% |
Conclusion
FitLife’s single-city offline marketing campaign in Denver had a positive and significant impact on product downloads, with both GeoLift and CausalImpact models showing increased downloads during the treatment period. While the CausalImpact model suggests a larger effect, the GeoLift model offers a more conservative estimate. Overall, the campaign was successful, driving up downloads and proving the efficacy of the 360 marketing strategy.
Your proctor is connecting to a voice chat with you...