fit life case

Data and Packages
1) Plot the time series data.
2) Use DiD to estimate the treatment effect in Denver.
3) Estimate the treatment effect in Denver with CausalImpact. Do not use any covariate series for this analysis.
4) Estimate treatment effect in Denver with CausalImpact using downloads from the control cities—San Francisco, Seattle and Los Angeles—as covariate time series. How does the estimate change?
5) Fit a GeoLift model (using the GeoLift() function) and output the model summary. How does the results differ from those obtained with CausalImpact?
- read in data
- Model
6) Plot the geolift model
7) Write up the results for FitLife’s single city offline marketing intervention.

Data and Packages

data <- read.csv("https://raw.githubusercontent.com/jefftwebb/data/main/offline_marketing_360.csv")

dictionary <- read.csv("https://raw.githubusercontent.com/jefftwebb/data/main/offline_marketing_360_data_dictionary.csv")

str(data)

## 'data.frame':    128 obs. of  5 variables:
##  $ date     : chr  "2021-07-01" "2021-07-02" "2021-07-03" "2021-07-04" ...
##  $ city     : chr  "Denver" "Denver" "Denver" "Denver" ...
##  $ treated  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ downloads: int  16 18 18 18 17 19 19 18 21 20 ...
##  $ post     : int  0 0 0 0 0 0 0 0 0 0 ...

1) Plot the time series data.

# Convert the date column to date type 
data$date <- as.Date(data$date)

# Create the time series plot
ggplot(data, aes(x = date, y = downloads, color = city)) +
  geom_line() +
  labs(title = "Time Series of Downloads", x = "Date", y = "Number of Downloads") +
  theme_minimal()

Interpretation

The time series plot shows the number of downloads over time for four cities: Denver, Los Angeles, San Francisco, and Seattle.

Seattle consistently has the highest number of downloads, around 40.
Los Angeles has a stable trend around 30–35 downloads.
San Francisco fluctuates but hovers around 25–30 downloads.
Denver starts at a lower number, around 15–20 downloads, but shows a noticeable upward trend over time, indicating growth.

2) Use DiD to estimate the treatment effect in Denver.

# Run the DiD model using lm()
did_model <- lm(downloads ~ treated * post, data = data)

# Show the model summary
summary(did_model)

## 
## Call:
## lm(formula = downloads ~ treated * post, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4048 -3.8333 -0.8452  4.1667  8.5952 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   33.4048     0.6719  49.715  < 2e-16 ***
## treated      -14.5476     1.3439 -10.825  < 2e-16 ***
## post           2.4286     0.8959   2.711  0.00766 ** 
## treated:post   5.3810     1.7918   3.003  0.00323 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.355 on 124 degrees of freedom
## Multiple R-squared:  0.6181, Adjusted R-squared:  0.6088 
## F-statistic: 66.89 on 3 and 124 DF,  p-value: < 2.2e-16

# Extract and print the DiD estimate (coefficient for 'treated:post')
ate_did <- round(coef(did_model)["treated:post"], 4)

# Print the ATE using cat() with rounding
cat("The ATE from the DiD model is:", ate_did, "\n")

## The ATE from the DiD model is: 5.381

Interpretation

After the 360 marketing campaign in Denver, there was an average increase of 5.38 downloads due to the treatment. This is significant, as indicated by the p-value (0.00323), suggesting a strong treatment effect.

3) Estimate the treatment effect in Denver with CausalImpact. Do not use any covariate series for this analysis.

range(as.Date(data$date))

## [1] "2021-07-01" "2021-08-01"

# Filter data for Denver only
denver_data <- data %>% filter(city == "Denver")

# Ensure the date is in Date format
denver_data$date <- as.Date(denver_data$date)

# Define pre-treatment and post-treatment periods
pre_period <- as.Date(c("2021-07-01", "2021-07-18")) 
post_period <- as.Date(c("2021-07-19", "2021-08-01"))

# Prepare the time series data for CausalImpact 
time_series <- denver_data %>% select(date, downloads)

# Apply CausalImpact to estimate the treatment effect
impact <- CausalImpact(time_series, pre_period, post_period)

# Summary of the results
summary(impact)

## Posterior inference {CausalImpact}
## 
##                          Average      Cumulative   
## Actual                   28           388          
## Prediction (s.d.)        20 (0.8)     275 (11.2)   
## 95% CI                   [18, 21]     [251, 295]   
##                                                    
## Absolute effect (s.d.)   8.1 (0.8)    113.2 (11.2) 
## 95% CI                   [6.7, 9.8]   [93.2, 136.5]
##                                                    
## Relative effect (s.d.)   42% (5.8%)   42% (5.8%)   
## 95% CI                   [32%, 54%]   [32%, 54%]   
## 
## Posterior tail-area probability p:   0.001
## Posterior prob. of a causal effect:  99.8997%
## 
## For more details, type: summary(impact, "report")

# Plot the results
plot(impact)

# Extract ate
ate <- impact$summary$AbsEffect[2] / length(post_period[1]:post_period[2]) 
cat("The Average Treatment Effect (ATE) is:", round(ate, 4), "\n")

## The Average Treatment Effect (ATE) is: 8.0867

Interpretation

Average Effect: The actual average number of downloads during the post-treatment period was 28, while the predicted downloads (what would have happened without the marketing campaign) were 20. This gives an average absolute effect of 8.1 downloads per day.

Cumulative Effect: Over the entire post-treatment period, the marketing campaign generated an additional 113.2 downloads (cumulative absolute effect), compared to the predicted total downloads of 275 without the treatment.

Relative Effect: The treatment resulted in a 42% increase in downloads compared to what was expected without the campaign.

Statistical Significance: The posterior probability of the campaign having a true causal effect is very high, nearly 100% (99.8997%), and the p-value is 0.001, which means the effect is highly statistically significant.

Visualization:

The top graph shows the actual (black line) vs. predicted (blue dashed line) downloads over time. After the intervention (vertical dashed line), actual downloads increase significantly compared to the prediction.
The middle graph shows the pointwise impact (the daily differences between actual and predicted downloads).
The bottom graph shows the cumulative effect, which grows steadily over the post-treatment period.

In summary, the marketing campaign had a significant positive impact, increasing downloads by an average of 8.1 per day and 42% overall during the post-treatment period.

4) Estimate treatment effect in Denver with CausalImpact using downloads from the control cities—San Francisco, Seattle and Los Angeles—as covariate time series. How does the estimate change?

# Filter the data for Denver and the control cities
control_data <- data %>% filter(city %in% c("San Francisco", "Seattle", "Los Angeles"))
denver_data <- data %>% filter(city == "Denver")

# Reshape control data into a wide format to use as covariates
control_wide <- control_data %>%
  select(date, city, downloads) %>%
  spread(city, downloads)

# denvers data 
time_series <- denver_data %>%
  select(date, downloads) %>%
  left_join(control_wide, by = "date")

# Convert date to numeric index 
time_series_matrix <- time_series %>%
  mutate(time_index = 1:n()) %>%
  select(-date)

# Define pre-treatment and post-treatment periods based on numeric index
pre_period <- c(1, 18)  
post_period <- c(19, 31)  

# Run CausalImpact with the covariates
impact_with_covariates <- CausalImpact(time_series_matrix, pre_period, post_period)

# Summary of the results
summary(impact_with_covariates)

## Posterior inference {CausalImpact}
## 
##                          Average       Cumulative  
## Actual                   28            359         
## Prediction (s.d.)        25 (0.9)      329 (11.6)  
## 95% CI                   [23, 27]      [304, 350]  
##                                                    
## Absolute effect (s.d.)   2.3 (0.9)     29.9 (11.6) 
## 95% CI                   [0.66, 4.2]   [8.57, 55.2]
##                                                    
## Relative effect (s.d.)   9.2% (3.9%)   9.2% (3.9%) 
## 95% CI                   [2.4%, 18%]   [2.4%, 18%] 
## 
## Posterior tail-area probability p:   0.00503
## Posterior prob. of a causal effect:  99.49698%
## 
## For more details, type: summary(impact, "report")

# Plot the results
plot(impact_with_covariates)

# Extract ATE
ate_with_covariates <- impact_with_covariates$summary$AbsEffect[2]
cat("The Average Treatment Effect (ATE) with covariates is:", round(ate_with_covariates, 4), "\n")

## The Average Treatment Effect (ATE) with covariates is: 29.9488

With the control cities’ downloads (San Francisco, Seattle, and Los Angeles) included as covariates, the results of the CausalImpact analysis change compared to the analysis without covariates.

Average Effect:

Actual downloads during the post-treatment period averaged 28 per day.
The model predicted 25 downloads (based on control cities’ trends).
The average absolute effect of the campaign is now 2.3 more downloads per day (compared to 8.1 in the analysis without covariates).

Cumulative Effect:

Over the post-treatment period, the cumulative additional downloads are 29.9 (compared to 113.2 without covariates).

Relative Effect:

The campaign resulted in a 9.2% increase in downloads (compared to 42% in the previous analysis).

Statistical Significance:

The posterior probability of a true causal effect remains very high, at 99.6%, with a p-value of 0.00402.

Plots

Top Plot (Original):

The actual downloads (black line) closely follow the predicted downloads (blue dashed line) before the intervention. After the campaign starts, actual downloads are slightly higher, indicating a small positive effect.

Middle Plot (Pointwise):

The pointwise differences between actual and predicted downloads become positive after the intervention, but they fluctuate. The effect is modest and within the confidence intervals.

Bottom Plot (Cumulative):

The cumulative effect gradually rises after the campaign, showing a positive overall impact, though the effect is small, with about 30 additional downloads by the end of the period.

Conclusion:

Including the covariates from the control cities reduced the estimated treatment effect of the campaign. This suggests that some of the increase in downloads originally attributed to the campaign might have been influenced by broader trends observed in other cities, rather than the campaign alone. The Average Treatment Effect (ATE) with covariates is 2.3 downloads/day, whereas it was 8.1 downloads/day without covariates.

5) Fit a GeoLift model (using the GeoLift() function) and output the model summary. How does the results differ from those obtained with CausalImpact?

read in data

geolift_data <- data %>%
  filter(city %in% c("Denver", "San Francisco", "Seattle", "Los Angeles")) %>%
  mutate(treatment = ifelse(city == "Denver" & date >= "2021-07-19", 1, 0))  

# 'time_id', 'location', 'Y', and 'treatment'
geolift_data <- geolift_data %>%
  rename(time_id = date, location = city, Y = downloads) %>%
  select(time_id, location, Y, treatment)

# Ensure that the time is in Date format
geolift_data$time_id <- as.Date(geolift_data$time_id)

# Inspect the structure of geolift_data to verify
str(geolift_data)

## 'data.frame':    128 obs. of  4 variables:
##  $ time_id  : Date, format: "2021-07-01" "2021-07-02" ...
##  $ location : chr  "Denver" "Denver" "Denver" "Denver" ...
##  $ Y        : int  16 18 18 18 17 19 19 18 21 20 ...
##  $ treatment: num  0 0 0 0 0 0 0 0 0 0 ...

# Read and format the data
geo_data <- GeoDataRead(data = geolift_data,
                        date_id = "time_id",      # Date column
                        location_id = "location", # Location column (cities)
                        Y_id = "Y",               # Outcome variable (downloads)
                        X = c(),                  # No covariates
                        format = "yyyy-mm-dd",    # Date format
                        summary = TRUE)

# Display the data summary using gt for a clean table view
geo_data |> gt()

location	time	Y
denver	1	16
denver	2	18
denver	3	18
denver	4	18
denver	5	17
denver	6	19
denver	7	19
denver	8	18
denver	9	21
denver	10	20
denver	11	21
denver	12	19
denver	13	20
denver	14	20
denver	15	21
denver	16	22
denver	17	24
denver	18	25
denver	19	25
denver	20	27
denver	21	25
denver	22	26
denver	23	26
denver	24	30
denver	25	27
denver	26	27
denver	27	28
denver	28	28
denver	29	30
denver	30	29
denver	31	31
denver	32	29
los angeles	1	30
los angeles	2	30
los angeles	3	32
los angeles	4	29
los angeles	5	32
los angeles	6	33
los angeles	7	31
los angeles	8	32
los angeles	9	31
los angeles	10	33
los angeles	11	33
los angeles	12	32
los angeles	13	32
los angeles	14	32
los angeles	15	32
los angeles	16	32
los angeles	17	34
los angeles	18	34
los angeles	19	32
los angeles	20	34
los angeles	21	35
los angeles	22	35
los angeles	23	33
los angeles	24	35
los angeles	25	35
los angeles	26	35
los angeles	27	34
los angeles	28	34
los angeles	29	35
los angeles	30	35
los angeles	31	37
los angeles	32	37
san francisco	1	27
san francisco	2	28
san francisco	3	28
san francisco	4	26
san francisco	5	28
san francisco	6	29
san francisco	7	28
san francisco	8	29
san francisco	9	30
san francisco	10	29
san francisco	11	29
san francisco	12	28
san francisco	13	29
san francisco	14	29
san francisco	15	29
san francisco	16	29
san francisco	17	32
san francisco	18	32
san francisco	19	30
san francisco	20	31
san francisco	21	32
san francisco	22	31
san francisco	23	31
san francisco	24	31
san francisco	25	31
san francisco	26	32
san francisco	27	31
san francisco	28	32
san francisco	29	33
san francisco	30	33
san francisco	31	33
san francisco	32	33
seattle	1	40
seattle	2	39
seattle	3	40
seattle	4	40
seattle	5	39
seattle	6	42
seattle	7	41
seattle	8	40
seattle	9	40
seattle	10	41
seattle	11	41
seattle	12	41
seattle	13	40
seattle	14	40
seattle	15	41
seattle	16	40
seattle	17	43
seattle	18	41
seattle	19	42
seattle	20	43
seattle	21	41
seattle	22	40
seattle	23	42
seattle	24	41
seattle	25	42
seattle	26	43
seattle	27	42
seattle	28	42
seattle	29	42
seattle	30	42
seattle	31	41
seattle	32	43

Model

# Check the structure of geo_data
str(geo_data)

## 'data.frame':    128 obs. of  3 variables:
##  $ location: chr  "denver" "denver" "denver" "denver" ...
##  $ time    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Y       : int  16 18 18 18 17 19 19 18 21 20 ...

# Ensure Y is numeric
geo_data <- geo_data %>%
  mutate(Y = as.numeric(Y))  # Ensure Y is numeric, no need to modify time

# Check structure again after conversion
str(geo_data)

## 'data.frame':    128 obs. of  3 variables:
##  $ location: chr  "denver" "denver" "denver" "denver" ...
##  $ time    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Y       : num  16 18 18 18 17 19 19 18 21 20 ...

# Fit the GeoLift model
gl_model <- GeoLift(
  Y_id = "Y",                      # Outcome variable (downloads)
  locations = c("denver"),          # Treatment location (denver)
  treatment_start_time = 19,        # Start of the treatment period (index 19)
  treatment_end_time = 31,          # End of the treatment period (index 31)
  data = geo_data,                  # The formatted dataset
  alpha = 0.05,                     # Confidence level (95%)
  stat_test = "Positive")           # Test for positive impact

# Output the model summary
summary(gl_model)

Average Treatment Effect:

GeoLift estimated an average effect (ATT) of 5.056 downloads per day.
CausalImpact estimated a higher effect of 8.1 downloads per day.

Cumulative/Incremental Downloads:

GeoLift estimated a total of 66 additional downloads due to the treatment.
CausalImpact estimated a higher cumulative effect of 113.2 downloads.

Percent Lift:

GeoLift calculated a 22.4% increase in downloads due to the treatment.
CausalImpact estimated a larger lift of 42%.

Statistical Significance:

Both models found strong evidence of a significant treatment effect. GeoLift had a p-value of 0, and CausalImpact had a 99.9% posterior probability of a causal effect.

Conclusion:

While both models found a significant positive impact from the treatment, CausalImpact estimated a larger effect on downloads compared to GeoLift, which provided a more conservative estimate.

6) Plot the geolift model

# Plot the treatment effects from the GeoLift model
plot(gl_model, type = "ATT")

Interpretation

Pre-Treatment Period (left of the dashed line):

The average effect before the treatment is close to zero, indicating that there was no significant difference between Denver and the control locations before the treatment started.

Post-Treatment Period (right of the dashed line):

After the treatment starts (indicated by the vertical dashed line), the average treatment effect increases. We see a noticeable jump in the effect, with a peak in the days immediately after the treatment, followed by stabilization at a higher level.

Shaded Area:

The shaded region represents the confidence interval around the effect estimates. The treatment effect remains positive throughout the post-treatment period, with the effect gradually stabilizing, but the confidence intervals are wider early in the treatment.

Conclusion: The treatment had a positive and statistically significant impact on downloads in Denver, with the effect stabilizing after an initial increase.

7) Write up the results for FitLife’s single city offline marketing intervention.

Using the GeoLift model, the treatment effect on Denver was estimated over a specified post-treatment period. The key results are as follows:

Average Treatment Effect on the Treated (ATT): The average daily lift in downloads was 5.056 downloads.
Percent Lift: The intervention led to a 22.4% increase in downloads in Denver.
Incremental Downloads: The campaign resulted in an additional 66 downloads during the treatment period.
Statistical Significance: The p-value of 0 indicates a highly significant impact of the campaign on downloads.

The GeoLift model suggests that the intervention produced a moderate and statistically significant positive effect, with a 22.4% increase in downloads attributed to the offline marketing campaign.

CausalImpact Model Results

The CausalImpact model was applied to estimate the treatment effect by constructing a counterfactual for Denver based on pre-treatment trends. The findings from CausalImpact are as follows:

Average Treatment Effect: The model estimated an average lift of 8.1 downloads per day, higher than GeoLift’s estimate.
Cumulative Effect: The intervention resulted in an estimated 113.2 additional downloads over the treatment period.
Percent Lift: The estimated percentage lift in downloads was 42%, nearly double the estimate from GeoLift.
Statistical Significance: The model indicated strong statistical significance, with a 99.9% posterior probability of a true causal effect.

CausalImpact provided a higher estimate of the treatment effect, indicating a stronger impact of the offline campaign compared to GeoLift’s more conservative results.

3. Comparison Between Models

Metric	GeoLift	CausalImpact
Average ATT/ATE	5.056 downloads	8.1 downloads
Cumulative Effect	66 downloads	113.2 downloads
Percent Lift	22.4%	42%
P-value (GeoLift)	0	—
Posterior Probability (CausalImpact)	—	99.9%

Magnitude of Impact: The CausalImpact model estimated a larger treatment effect compared to GeoLift, both in terms of daily lift and cumulative downloads.
Percent Lift: GeoLift estimated a 22.4% increase, while CausalImpact estimated a 42% increase in downloads, showing some discrepancy in the estimated lift.
Statistical Significance: Both models agree on the significance of the treatment effect, indicating a clear positive impact of the campaign.

Conclusion

FitLife’s single-city offline marketing campaign in Denver had a positive and significant impact on product downloads, with both GeoLift and CausalImpact models showing increased downloads during the treatment period. While the CausalImpact model suggests a larger effect, the GeoLift model offers a more conservative estimate. Overall, the campaign was successful, driving up downloads and proving the efficacy of the 360 marketing strategy.

fit life case

Scott Silverstein

2024-10-20

Data and Packages

1) Plot the time series data.

2) Use DiD to estimate the treatment effect in Denver.

3) Estimate the treatment effect in Denver with CausalImpact. Do not use any covariate series for this analysis.

4) Estimate treatment effect in Denver with CausalImpact using downloads from the control cities—San Francisco, Seattle and Los Angeles—as covariate time series. How does the estimate change?

5) Fit a GeoLift model (using the GeoLift() function) and output the model summary. How does the results differ from those obtained with CausalImpact?

read in data

Model

6) Plot the geolift model

7) Write up the results for FitLife’s single city offline marketing intervention.