Management Training Case

Packages and Data
1) Estimate the average treatment effect (ATE) by calculating the difference in mean engagement scores for treatment vs. control.
Estimate ATE with multiple regression, adjusting for possible confounders. Comment on your results. What does the change in estimated treatment effect say about the bias in your previous estimate?
3) Use coarsened exact matching (CEM) to create synthetic treatment and control groups for estimating the treatment effect. (Because the matched data has discarded both treatment and control units you won’t be estimating the ATE or the ATT—average effect on the treated— but rather the average treatment effect in the remaining matched sample or ATM.) -Create balance tables to compare treatment and control group before and after CEM. -Estimate the treatment effect (ATM).
- CEM
- Balance Table
ATM
4) Use full matching to create weighted synthetic treatment and control groups.
5) Use inverse propensity score weighting in regression to estimate the ATE of the training program.
- Estimate Propensity Scores
- Checking Overlap
Trimming
6) Calculate a 95% confidence interval for the ATE from the previous question. Use the bootstrap.
7) Write up your results.

Packages and Data

library(tidyverse)
library(MatchIt)
library(boot)

management_data <-  read.csv("https://raw.githubusercontent.com/jefftwebb/data/main/management_training.csv")

head(management_data)

##   department_id intervention engagement_score tenure n_of_reports gender role
## 1            76            1             0.28      6            4      2    4
## 2            76            1            -0.45      4            8      2    4
## 3            76            1             0.77      6            4      2    4
## 4            76            1            -0.12      6            4      2    4
## 5            76            1             1.53      6            4      1    4
## 6            76            1             0.01      6            4      2    4
##   last_engagement_score department_score department_size
## 1                  0.61                0             843
## 2                  0.07                0             843
## 3                  0.87                0             843
## 4                  0.03                0             843
## 5                  0.59                0             843
## 6                 -0.39                0             843

colnames(management_data)

##  [1] "department_id"         "intervention"          "engagement_score"     
##  [4] "tenure"                "n_of_reports"          "gender"               
##  [7] "role"                  "last_engagement_score" "department_score"     
## [10] "department_size"

str(management_data)

## 'data.frame':    10391 obs. of  10 variables:
##  $ department_id        : int  76 76 76 76 76 76 76 76 76 76 ...
##  $ intervention         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ engagement_score     : num  0.28 -0.45 0.77 -0.12 1.53 0.01 1.74 0.95 2.37 1.26 ...
##  $ tenure               : int  6 4 6 6 6 6 6 6 6 6 ...
##  $ n_of_reports         : int  4 8 4 4 4 4 4 4 5 4 ...
##  $ gender               : int  2 2 2 2 1 2 1 1 2 1 ...
##  $ role                 : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ last_engagement_score: num  0.61 0.07 0.87 0.03 0.59 -0.39 1.96 3.73 -0.93 0.32 ...
##  $ department_score     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ department_size      : int  843 843 843 843 843 843 843 843 843 843 ...

1) Estimate the average treatment effect (ATE) by calculating the difference in mean engagement scores for treatment vs. control.

# Calculate the mean engagement score for the t and c 
mean_engagement_treated <- management_data %>% 
  filter(intervention == 1) %>% 
  summarise(mean_score = mean(engagement_score, na.rm = TRUE))

mean_engagement_control <- management_data %>% 
  filter(intervention == 0) %>% 
  summarise(mean_score = mean(engagement_score, na.rm = TRUE))

# Calculate the ATE 
ate <- mean_engagement_treated$mean_score - mean_engagement_control$mean_score
paste("The Average Treatment Effect (ATE) is:", round(ate,2))

## [1] "The Average Treatment Effect (ATE) is: 0.43"

Explanation Without accounting for any other factors, the managers who participated in the training had engagement scores that were, on average, 0.43 points higher than those who did not.

Estimate ATE with multiple regression, adjusting for possible confounders. Comment on your results. What does the change in estimated treatment effect say about the bias in your previous estimate?

# Fit the model
model <- lm(engagement_score ~ intervention + tenure + n_of_reports + last_engagement_score + department_score + gender + role, 
            data = management_data)

# View the summary of the model to get the coefficient for intervention
summary(model)

## 
## Call:
## lm(formula = engagement_score ~ intervention + tenure + n_of_reports + 
##     last_engagement_score + department_score + gender + role, 
##     data = management_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4440 -0.6124 -0.0098  0.5783  3.2243 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -1.761955   0.055309 -31.857  < 2e-16 ***
## intervention           0.269784   0.017686  15.254  < 2e-16 ***
## tenure                 0.364649   0.007837  46.530  < 2e-16 ***
## n_of_reports          -0.001758   0.003915  -0.449    0.653    
## last_engagement_score -0.010652   0.006591  -1.616    0.106    
## department_score       0.005447   0.009719   0.560    0.575    
## gender                -0.275734   0.017210 -16.022  < 2e-16 ***
## role                   0.046728   0.006668   7.007 2.58e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8758 on 10383 degrees of freedom
## Multiple R-squared:  0.2335, Adjusted R-squared:  0.233 
## F-statistic: 451.9 on 7 and 10383 DF,  p-value: < 2.2e-16

# Extract the coefficient for 'intervention' which is the adjusted ATE
adjusted_ate <- coef(model)["intervention"]

# Round the adjusted ATE to 2 decimal places
rounded_adjusted_ate <- round(adjusted_ate, 2)
paste("The Adjusted ATE is",rounded_adjusted_ate)

## [1] "The Adjusted ATE is 0.27"

Explanation The unadjusted ATE (calculated earlier) was 0.43, which means that, without accounting for any other factors, the managers who participated in the training had engagement scores that were, on average, 0.43 points higher than those who did not.

However, after adjusting for other factors (like tenure, number of reports, gender, and last engagement score), the adjusted ATE dropped to 0.27. This suggests that some of the initial difference in engagement scores was due to these confounding factors, not just the training itself.

The fact that the adjusted effect is smaller implies that the unadjusted estimate was biased. In other words, the unadjusted ATE overestimated the true effect of the training because it didn’t account for these additional factors, like tenure and gender, which also influence engagement scores.

3) Use coarsened exact matching (CEM) to create synthetic treatment and control groups for estimating the treatment effect. (Because the matched data has discarded both treatment and control units you won’t be estimating the ATE or the ATT—average effect on the treated— but rather the average treatment effect in the remaining matched sample or ATM.) -Create balance tables to compare treatment and control group before and after CEM. -Estimate the treatment effect (ATM).

CEM

m_cem <- matchit(intervention ~ tenure + n_of_reports + last_engagement_score + department_score + gender + role, 
                 data = management_data, 
                 cutpoints = list(tenure = 5, n_of_reports = 5, last_engagement_score = 5, department_score = 5, gender = 2, role = 2), 
                 method = "cem")
summary(m_cem)

## 
## Call:
## matchit(formula = intervention ~ tenure + n_of_reports + last_engagement_score + 
##     department_score + gender + role, data = management_data, 
##     method = "cem", cutpoints = list(tenure = 5, n_of_reports = 5, 
##         last_engagement_score = 5, department_score = 5, gender = 2, 
##         role = 2))
## 
## Summary of Balance for All Data:
##                       Means Treated Means Control Std. Mean Diff. Var. Ratio
## tenure                       5.4742        5.0262          0.4289     0.8138
## n_of_reports                 4.3039        4.3017          0.0010     0.8967
## last_engagement_score        0.1895       -0.0977          0.2092     1.0164
## department_score            -0.0932        0.0387         -0.1364     0.9227
## gender                       1.4755        1.5069         -0.0629     0.9978
## role                         2.4140        2.4874         -0.0544     0.9171
##                       eCDF Mean eCDF Max
## tenure                   0.0640   0.1834
## n_of_reports             0.0186   0.0341
## last_engagement_score    0.0367   0.0967
## department_score         0.0260   0.0786
## gender                   0.0157   0.0314
## role                     0.0259   0.0649
## 
## Summary of Balance for Matched Data:
##                       Means Treated Means Control Std. Mean Diff. Var. Ratio
## tenure                       5.5000        5.4576          0.0406     1.0904
## n_of_reports                 4.3024        4.2879          0.0068     0.9927
## last_engagement_score        0.1905        0.1593          0.0228     0.9919
## department_score            -0.1026       -0.1023         -0.0003     0.9980
## gender                       1.4747        1.4747          0.0000     0.9999
## role                         2.4301        2.4552         -0.0186     0.9735
##                       eCDF Mean eCDF Max Std. Pair Dist.
## tenure                   0.0061   0.0423          0.1812
## n_of_reports             0.0022   0.0115          0.0775
## last_engagement_score    0.0054   0.0332          0.4659
## department_score         0.0001   0.0003          0.0015
## gender                   0.0000   0.0000          0.0000
## role                     0.0071   0.0180          0.3931
## 
## Sample Sizes:
##               Control Treated
## All            4780.     5611
## Matched (ESS)  3296.2    5450
## Matched        4430.     5450
## Unmatched       350.      161
## Discarded         0.        0

Balance Table

# Step 1: Balance Table Before Matching
balance_before <- management_data %>%
  group_by(intervention) %>%
  summarise(
    mean_tenure = mean(tenure, na.rm = TRUE),
    mean_n_of_reports = mean(n_of_reports, na.rm = TRUE),
    mean_last_engagement_score = mean(last_engagement_score, na.rm = TRUE),
    mean_department_score = mean(department_score, na.rm = TRUE),
    mean_gender = mean(gender, na.rm = TRUE),
    mean_role = mean(role, na.rm = TRUE)
  )

# Step 2: Matched Data
matched_data <- match.data(m_cem)

# Step 3: Balance Table After Matching
balance_after <- matched_data %>%
  group_by(intervention) %>%
  summarise(
    mean_tenure = mean(tenure, na.rm = TRUE),
    mean_n_of_reports = mean(n_of_reports, na.rm = TRUE),
    mean_last_engagement_score = mean(last_engagement_score, na.rm = TRUE),
    mean_department_score = mean(department_score, na.rm = TRUE),
    mean_gender = mean(gender, na.rm = TRUE),
    mean_role = mean(role, na.rm = TRUE)
  )

# Step 4: Calculate Standardized Mean Differences (SMD)

smd_before <- (balance_before[2, -1] - balance_before[1, -1]) / apply(management_data[,c('tenure', 'n_of_reports', 'last_engagement_score', 'department_score', 'gender', 'role')], 2, sd)
smd_after <- (balance_after[2, -1] - balance_after[1, -1]) / apply(matched_data[,c('tenure', 'n_of_reports', 'last_engagement_score', 'department_score', 'gender', 'role')], 2, sd)

# Step 5: Combine everything into one table

balance_table <- data.frame(
  Variable = c('Tenure', 'Number of Reports', 'Last Engagement Score', 'Department Score', 'Gender', 'Role'),
  Mean_Before_Treatment = as.numeric(balance_before[2, -1]),
  Mean_Before_Control = as.numeric(balance_before[1, -1]),
  Mean_After_Treatment = as.numeric(balance_after[2, -1]),
  Mean_After_Control = as.numeric(balance_after[1, -1]),
  SMD_Before = as.numeric(smd_before),
  SMD_After = as.numeric(smd_after)
)

# Display the combined balance table
print(balance_table)

##                Variable Mean_Before_Treatment Mean_Before_Control
## 1                Tenure            5.47424701          5.02615063
## 2     Number of Reports            4.30386740          4.30167364
## 3 Last Engagement Score            0.18949563         -0.09774477
## 4      Department Score           -0.09320977          0.03870293
## 5                Gender            1.47549456          1.50690377
## 6                  Role            2.41400820          2.48744770
##   Mean_After_Treatment Mean_After_Control    SMD_Before   SMD_After
## 1            5.5000000        5.146952596  0.3998129634  0.33875145
## 2            4.3023853        4.269074492  0.0009977376  0.01525282
## 3            0.1905486       -0.055582393  0.2088168524  0.18512036
## 4           -0.1025688        0.002257336 -0.1335881984 -0.11001866
## 5            1.4746789        1.501580135 -0.0628280901 -0.05381868
## 6            2.4300917        2.531151242 -0.0532780375 -0.07407405

Explanation

This balance table compares the treatment and control groups before and after Coarsened Exact Matching (CEM). Before matching, there were noticeable differences, particularly in tenure (SMD = 0.3998) and last engagement score (SMD = 0.2088). After matching, the covariate means are more aligned, and the SMDs are reduced, especially for number of reports (SMD = 0.0153). However, some imbalance remains in tenure (SMD = 0.3388) and last engagement score (SMD = 0.1851), though CEM has improved overall balance between the groups.

ATM

# Step 1: lr on the matched data to estimate the ATM
atm_model <- lm(engagement_score ~ intervention + tenure + n_of_reports + last_engagement_score + department_score + gender + role, 
                data = matched_data)

# Step 2: Summary of the regression model 
summary(atm_model)

## 
## Call:
## lm(formula = engagement_score ~ intervention + tenure + n_of_reports + 
##     last_engagement_score + department_score + gender + role, 
##     data = matched_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.40328 -0.60703 -0.01443  0.57524  2.75920 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -1.948427   0.059919 -32.517  < 2e-16 ***
## intervention           0.268558   0.017955  14.958  < 2e-16 ***
## tenure                 0.396666   0.008550  46.391  < 2e-16 ***
## n_of_reports          -0.002086   0.004022  -0.519    0.604    
## last_engagement_score -0.010069   0.006943  -1.450    0.147    
## department_score      -0.002707   0.010398  -0.260    0.795    
## gender                -0.278716   0.017549 -15.882  < 2e-16 ***
## role                   0.053424   0.006977   7.657 2.08e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8709 on 9872 degrees of freedom
## Multiple R-squared:  0.2346, Adjusted R-squared:  0.2341 
## F-statistic: 432.3 on 7 and 9872 DF,  p-value: < 2.2e-16

# Step 4: Extract the coefficient for 'intervention', which is the ATM
atm <- coef(atm_model)["intervention"]
paste("The atm is:", round(atm,2))

## [1] "The atm is: 0.27"

Compared to the earlier unadjusted ATE of 0.43, the adjusted treatment effect in the matched sample (ATM) is 0.27. However, there was little to no change in in previous adjusted ATE. This means that matching didn’t change the treatment effect significantly after adjusting for the same variables.

4) Use full matching to create weighted synthetic treatment and control groups.

Create balance tables to compare treatment and control groups for estimating ATE before and after full matching. Estimate ATE using linear regression. Estimate ATE using G-computation.

Full matching

m_full <- matchit(intervention ~ tenure + n_of_reports + last_engagement_score + department_score + gender + role, 
                  data = management_data, 
                  method = "full")
summary(m_full)

## 
## Call:
## matchit(formula = intervention ~ tenure + n_of_reports + last_engagement_score + 
##     department_score + gender + role, data = management_data, 
##     method = "full")
## 
## Summary of Balance for All Data:
##                       Means Treated Means Control Std. Mean Diff. Var. Ratio
## distance                     0.5631        0.5129          0.4823     0.8363
## tenure                       5.4742        5.0262          0.4289     0.8138
## n_of_reports                 4.3039        4.3017          0.0010     0.8967
## last_engagement_score        0.1895       -0.0977          0.2092     1.0164
## department_score            -0.0932        0.0387         -0.1364     0.9227
## gender                       1.4755        1.5069         -0.0629     0.9978
## role                         2.4140        2.4874         -0.0544     0.9171
##                       eCDF Mean eCDF Max
## distance                 0.1303   0.2039
## tenure                   0.0640   0.1834
## n_of_reports             0.0186   0.0341
## last_engagement_score    0.0367   0.0967
## department_score         0.0260   0.0786
## gender                   0.0157   0.0314
## role                     0.0259   0.0649
## 
## Summary of Balance for Matched Data:
##                       Means Treated Means Control Std. Mean Diff. Var. Ratio
## distance                     0.5631        0.5631          0.0003     1.0003
## tenure                       5.4742        5.4559          0.0175     1.0935
## n_of_reports                 4.3039        4.3270         -0.0108     0.9663
## last_engagement_score        0.1895        0.2352         -0.0333     1.0428
## department_score            -0.0932       -0.1113          0.0187     1.1690
## gender                       1.4755        1.4852         -0.0193     0.9979
## role                         2.4140        2.5972         -0.1356     0.9495
##                       eCDF Mean eCDF Max Std. Pair Dist.
## distance                 0.0006   0.0030          0.0033
## tenure                   0.0062   0.0171          0.3976
## n_of_reports             0.0061   0.0121          1.0604
## last_engagement_score    0.0077   0.0229          0.9204
## department_score         0.0210   0.0540          0.9695
## gender                   0.0048   0.0097          0.9908
## role                     0.0366   0.0982          1.1160
## 
## Sample Sizes:
##               Control Treated
## All           4780.      5611
## Matched (ESS) 1293.87    5611
## Matched       4780.      5611
## Unmatched        0.         0
## Discarded        0.         0

Balance Table

# Before Matching Balance
balance_before_full <- management_data %>%
  group_by(intervention) %>%
  summarise(
    mean_tenure = mean(tenure, na.rm = TRUE),
    mean_n_of_reports = mean(n_of_reports, na.rm = TRUE),
    mean_last_engagement_score = mean(last_engagement_score, na.rm = TRUE),
    mean_department_score = mean(department_score, na.rm = TRUE),
    mean_gender = mean(gender, na.rm = TRUE),
    mean_role = mean(role, na.rm = TRUE)
  )

# Matched Data After Full Matching
matched_data_full <- match.data(m_full)

# After Matching Balance
balance_after_full <- matched_data_full %>%
  group_by(intervention) %>%
  summarise(
    mean_tenure = mean(tenure, na.rm = TRUE),
    mean_n_of_reports = mean(n_of_reports, na.rm = TRUE),
    mean_last_engagement_score = mean(last_engagement_score, na.rm = TRUE),
    mean_department_score = mean(department_score, na.rm = TRUE),
    mean_gender = mean(gender, na.rm = TRUE),
    mean_role = mean(role, na.rm = TRUE)
  )
# Compare SMD before and after matching (optional)
smd_before_full <- (balance_before_full[2, -1] - balance_before_full[1, -1]) / apply(management_data[,c('tenure', 'n_of_reports', 'last_engagement_score', 'department_score', 'gender', 'role')], 2, sd)

smd_after_full <- (balance_after_full[2, -1] - balance_after_full[1, -1]) / apply(matched_data_full[,c('tenure', 'n_of_reports', 'last_engagement_score', 'department_score', 'gender', 'role')], 2, sd)


#Tables 
balance_table_full <- data.frame(
  Variable = c('Tenure', 'Number of Reports', 'Last Engagement Score', 'Department Score', 'Gender', 'Role'),
  Mean_Before_Treatment = as.numeric(balance_before_full[2, -1]),
  Mean_Before_Control = as.numeric(balance_before_full[1, -1]),
  Mean_After_Treatment = as.numeric(balance_after_full[2, -1]),
  Mean_After_Control = as.numeric(balance_after_full[1, -1]),
  SMD_Before = as.numeric(smd_before_full),
  SMD_After = as.numeric(smd_after_full)
)

# Display the combined balance table
print(balance_table_full)

##                Variable Mean_Before_Treatment Mean_Before_Control
## 1                Tenure            5.47424701          5.02615063
## 2     Number of Reports            4.30386740          4.30167364
## 3 Last Engagement Score            0.18949563         -0.09774477
## 4      Department Score           -0.09320977          0.03870293
## 5                Gender            1.47549456          1.50690377
## 6                  Role            2.41400820          2.48744770
##   Mean_After_Treatment Mean_After_Control    SMD_Before     SMD_After
## 1           5.47424701         5.02615063  0.3998129634  0.3998129634
## 2           4.30386740         4.30167364  0.0009977376  0.0009977376
## 3           0.18949563        -0.09774477  0.2088168524  0.2088168524
## 4          -0.09320977         0.03870293 -0.1335881984 -0.1335881984
## 5           1.47549456         1.50690377 -0.0628280901 -0.0628280901
## 6           2.41400820         2.48744770 -0.0532780375 -0.0532780375

Explanation This balance table shows the comparison of means and standardized mean differences (SMD) for several covariates before and after full matching. The means for the treatment and control groups remain the same both before and after matching, indicating that full matching did not alter the balance for these covariates. Additionally, the SMDs before and after matching are identical, suggesting that full matching did not improve covariate balance. This could imply that the dataset was already fairly balanced prior to matching or that full matching was not effective in this case for improving balance.

ATE using Linear Regression

# Lr on matched data using weights
lm_full <- lm(engagement_score ~ intervention + tenure + n_of_reports + last_engagement_score + department_score + gender + role, 
              data = matched_data_full, 
              weights = weights)

# Summary 
summary(lm_full)

## 
## Call:
## lm(formula = engagement_score ~ intervention + tenure + n_of_reports + 
##     last_engagement_score + department_score + gender + role, 
##     data = matched_data_full, weights = weights)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5014 -0.5514 -0.0168  0.5193  9.3823 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -1.982437   0.060232 -32.913   <2e-16 ***
## intervention           0.247203   0.017284  14.303   <2e-16 ***
## tenure                 0.402079   0.008416  47.774   <2e-16 ***
## n_of_reports          -0.001302   0.003983  -0.327   0.7437    
## last_engagement_score -0.011351   0.006596  -1.721   0.0853 .  
## department_score      -0.002342   0.010144  -0.231   0.8174    
## gender                -0.269652   0.017203 -15.674   <2e-16 ***
## role                   0.057762   0.006672   8.657   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8754 on 10383 degrees of freedom
## Multiple R-squared:  0.216,  Adjusted R-squared:  0.2155 
## F-statistic: 408.7 on 7 and 10383 DF,  p-value: < 2.2e-16

# Extract the ATE 
ate_full <- coef(lm_full)["intervention"]
print(paste("The ATE from linear regression is:", round(ate_full, 2)))

## [1] "The ATE from linear regression is: 0.25"

The ATE dropped a tad meaming that the treatment affect was marginally adjusted based on the confounders.

Using G- computation

# G-computation using regression model
g_model <- lm(engagement_score ~ intervention + tenure + n_of_reports + last_engagement_score + department_score + gender + role, 
              data = matched_data_full)

# Predict outcomes for everyone as if they received the treatment 
pred_treated <- predict(g_model, newdata = within(matched_data_full, intervention <- 1))

# Predict outcomes for everyone as if they did not receive the treatment 
pred_control <- predict(g_model, newdata = within(matched_data_full, intervention <- 0))

# Estimate the ATE using G-computation
ate_gcomp <- mean(pred_treated - pred_control)
print(paste("The ATE from G-computation is:", round(ate_gcomp, 2)))

## [1] "The ATE from G-computation is: 0.27"

The ATE has not changed using G-Computation

5) Use inverse propensity score weighting in regression to estimate the ATE of the training program.

Make sure to check overlap in propensity scores with a plot. Trim the data if necessary before estimating the treatment effect.

Estimate Propensity Scores

#Estimate propensity scores using logistic regression
ps_model <- glm(intervention ~ tenure + n_of_reports + last_engagement_score + department_score + gender + role, 
                family = binomial, data = management_data)

# Extract propensity scores
management_data$propensity_score <- predict(ps_model, type = "response")

Checking Overlap

ggplot(management_data, aes(x = propensity_score, fill = factor(intervention))) +
  geom_histogram(binwidth = 0.05, position = "dodge", color = "black") +
  labs(title = "Propensity Score Overlap between Treatment and Control Groups", 
       x = "Propensity Score", fill = "Intervention") +
  scale_fill_manual(values = c("0" = "red", "1" = "blue"), labels = c("Control", "Treatment"))

Explanation

The propensity score plot shows good overlap between the treatment and control groups for scores between 0.3 and 0.7, indicating comparability in this range. However, there is poor overlap at the extremes, with few treated individuals having scores below 0.3 and few control individuals having scores above 0.7. To improve the reliability of the ATE estimate, trimming individuals with extreme propensity scores may be necessary.

Trimming

# Step 1: Define the lower and upper bounds 
lower_bound <- quantile(management_data$propensity_score, 0.05)
upper_bound <- quantile(management_data$propensity_score, 0.95)

# Step 2: Trim the data to include only individuals with propensity scores between the 5th and 95th percentiles
trimmed_data <- management_data %>%
  filter(propensity_score >= lower_bound & propensity_score <= upper_bound)

# Step 3: Re-plot the propensity scores for the trimmed data to check overlap again
ggplot(trimmed_data, aes(x = propensity_score, fill = factor(intervention))) +
  geom_histogram(binwidth = 0.05, position = "dodge", color = "black") +
  labs(title = "Propensity Score Overlap after Trimming", 
       x = "Propensity Score", fill = "Intervention") +
  scale_fill_manual(values = c("0" = "red", "1" = "blue"), labels = c("Control", "Treatment"))

Explanation

This updated propensity score plot, after trimming the extremes, shows improved overlap between the treatment (blue) and control (red) groups within the 0.3 to 0.7 range. Both groups now have more similar distributions of propensity scores, especially between 0.4 and 0.6, where the overlap is most substantial. This suggests that after trimming, the treated and control groups are more comparable, making the treatment effect estimation more reliable. However, there is still some imbalance near the upper range of propensity scores (around 0.7), where there are fewer control units. Overall, the trimming process has successfully reduced the extreme values and increased overlap, which should improve the robustness of the ATE estimation.

# Step 1: Compute inverse propensity weights 
trimmed_data$weight <- ifelse(trimmed_data$intervention == 1, 
                              1 / trimmed_data$propensity_score, 
                              1 / (1 - trimmed_data$propensity_score))

# Step 2: Run a weighted lr
ipsw_model <- lm(engagement_score ~ intervention + tenure + n_of_reports + last_engagement_score + department_score + gender + role, 
                 data = trimmed_data, weights = weight)

# summary
summary(ipsw_model)

## 
## Call:
## lm(formula = engagement_score ~ intervention + tenure + n_of_reports + 
##     last_engagement_score + department_score + gender + role, 
##     data = trimmed_data, weights = weight)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5361 -0.8395 -0.0221  0.8062  4.5298 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -2.246718   0.066953 -33.556  < 2e-16 ***
## intervention           0.246309   0.017853  13.797  < 2e-16 ***
## tenure                 0.454615   0.010365  43.861  < 2e-16 ***
## n_of_reports          -0.004836   0.004078  -1.186    0.236    
## last_engagement_score -0.006771   0.007059  -0.959    0.337    
## department_score       0.003740   0.010096   0.370    0.711    
## gender                -0.277728   0.017863 -15.547  < 2e-16 ***
## role                   0.045128   0.006883   6.557 5.79e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.22 on 9343 degrees of freedom
## Multiple R-squared:  0.2089, Adjusted R-squared:  0.2083 
## F-statistic: 352.4 on 7 and 9343 DF,  p-value: < 2.2e-16

# Extract the ATE 
ate_ipsw <- coef(ipsw_model)["intervention"]
print(paste("The ATE from inverse propensity score weighting is:", round(ate_ipsw, 2)))

## [1] "The ATE from inverse propensity score weighting is: 0.25"

Explanation

The ATE from inverse propensity score weighting (IPSW) is 0.25, meaning the training program increased engagement scores by 0.25 points on average after adjusting for differences in the likelihood of receiving the treatment. This result is consistent with the ATE from the earlier linear regression model, which also estimated the ATE at 0.25, and it is slightly lower than the ATE of 0.27 from the G-computation method. The similarity between these results suggests that, after accounting for confounders, the estimated effect of the training program is robust across different methods. The IPSW method, which further adjusts for differences in propensity to receive treatment, confirms that the training program has a positive effect on engagement scores, and this result is in line with earlier findings.

The other covariates, such as tenure, remain significant predictors of engagement scores, with a coefficient of 0.45, showing a strong positive effect. However, some variables like number of reports and last engagement score are not statistically significant in this model, similar to previous models. Overall, the results reinforce the conclusion that the training program had a measurable positive impact on engagement.

6) Calculate a 95% confidence interval for the ATE from the previous question. Use the bootstrap.

# Step 1: Define a function to calculate the ATE
bootstrap_ate <- function(data, indices) {
  # Resample the data
  resampled_data <- data[indices, ]
  
  # Run the weighted linear regression on the resampled data
  ipsw_model <- lm(engagement_score ~ intervention + tenure + n_of_reports + last_engagement_score + department_score + gender + role, 
                   data = resampled_data, weights = resampled_data$weight)
  
  # Return the ATE 
  return(coef(ipsw_model)["intervention"])
}

# Step 2: Perform bootstrapping
set.seed(123)  
bootstrap_results <- boot(data = trimmed_data, statistic = bootstrap_ate, R = 1000)  

# Step 3: Calculate the 95% confidence interval
boot.ci(bootstrap_results, type = "perc")

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = bootstrap_results, type = "perc")
## 
## Intervals : 
## Level     Percentile     
## 95%   ( 0.2122,  0.2837 )  
## Calculations and Intervals on Original Scale

The 95% confidence interval for the ATE using bootstrap resampling is (0.2122, 0.2837). This means that we can be 95% confident that the true average treatment effect of the training program on engagement scores lies between 0.21 and 0.28. The result suggests a positive and statistically significant effect of the program, as the interval does not include zero.

7) Write up your results.

Provide brief background of the case and your analytic objective, along with the challenge posed by confounding. Be specific about the confounding—which variables might be confounders and why? Summarize your approach and results. What is your best estimate of the 95% confidence interval for the treatment effect?

Background and Analytical objective

Nextech, a large multinational hardware manufacturer, implemented a management training program designed to help newly promoted managers improve their leadership skills and team dynamics. The objective of this analysis was to assess the effectiveness of the training program by estimating its impact on engagement scores, which serve as a key outcome measure of team performance.

The challenge in assessing the program’s effectiveness lies in the potential for confounding—managers who attended the training may differ systematically from those who did not, potentially leading to biased estimates of the training effect. Managers with higher levels of prior experience or higher past engagement scores might be more likely to attend training and also have higher engagement scores regardless of the training itself.

Counfounding Variables

Key Potential Confounders

Tenure: managers with more experience might both attend the training and have higher engagement scores because of their experience alone.
Number of direct reports: managers with larger teams may be more likely to attend training due to greater responsibilities.
last engagement scores: managers with higher previous engagement scores may be more motivated to attend training, and their past performance could influence current engagement scores.

Analytic Approach and Results:

I used several causal inference techniques:

Inverse Propensity Score Weighting (IPSW): First, I estimate the propensity scores using logistic regression to capture the probability of attending training based on the confounding variables. I then used these scores to compute weights for a weighted regression model to estimate the true effect.
Trimming: To ensure comparability and exchange-ability, I trimmed the data by excluding individuals with extreme propensity scores (before the 5th and above the 95th percentiles), which reduced potential bias from individuals with extreme probabilities or receiving the treatment.
Bootstrapping: To obtain a robust confidence interval, I used bootstrapping with 1000 replicates to estimate a 95% confidence interval for the average treatment effect (ATE).

The ATE from the inverse propensity score was .25 meaning that the training program increased engagement scores by an average of .25 points after adjusting for confounders.

95% Confidence Interval for the Treatment Effect

The best estimate of the 95% confidence interval for the treatment effect, based on bootstrapping, is (0.2122, 0.2837). This indicates that the training program had a statistically significant positive effect on engagement scores, as the interval does not include zero.

In conclusion, the management training program at Nextech had a positive and significant impact on engagement scores, with the treatment effect estimated to be between 0.21 and 0.28 points. This finding supports the value of the training program for improving managerial effectiveness and team dynamics.

Management Training Case

Scott Silverstein

2024-09-28

Packages and Data

1) Estimate the average treatment effect (ATE) by calculating the difference in mean engagement scores for treatment vs. control.

Estimate ATE with multiple regression, adjusting for possible confounders. Comment on your results. What does the change in estimated treatment effect say about the bias in your previous estimate?

CEM

Balance Table

ATM

4) Use full matching to create weighted synthetic treatment and control groups.

Full matching

Balance Table

ATE using Linear Regression

Using G- computation

5) Use inverse propensity score weighting in regression to estimate the ATE of the training program.

Estimate Propensity Scores

Checking Overlap

Trimming

6) Calculate a 95% confidence interval for the ATE from the previous question. Use the bootstrap.

7) Write up your results.

Background and Analytical objective

Counfounding Variables

Analytic Approach and Results:

95% Confidence Interval for the Treatment Effect