WorkAboutHire Me
Machine Learning · Marketing Analytics

Propensity Modeling for Bank Marketing

ML-driven campaign targeting achieving 93% AUC—increasing conversion rates 3x through precision customer selection.

93%
AUC Score (XGBoost)
3x
Better Than Random
ROI
Optimized Targeting
11.3%
Base Conversion Rate
PythonXGBoostRandom ForestScikit-LearnPandasMatplotlibSeabornFeature Engineering

Academic Project: ALY6041 - Python & Analytics System Technology | Northeastern University | Dr. Shaiyan Keshvari | June 2025

Model Performance Comparison

Evaluated three classification models—XGBoost emerged as the winner with superior AUC and balanced precision-recall.

ROC Curve Analysis

ROC curves showing XGBoost at 93% AUC outperforming other models

XGBoost (AUC=0.93) outperformed Random Forest (0.92) and Logistic Regression (0.91)

XGBoost Confusion Matrix

Confusion matrix showing balanced classification performance

Well-calibrated model accurately identifying both subscribers and non-subscribers

Top Predictive Features

Feature importance showing Previous Success and Contact Unknown as top predictors

Previous campaign success, contact method uncertainty, and March timing were strongest predictors

1Problem

Direct marketing campaigns have notoriously low conversion rates (11.3% in this dataset). Banks waste millions on broad, untargeted outreach, causing both inefficiency and customer fatigue from unwanted calls.

Cost Reality: If contacting 10,000 random customers costs $50K and yields 1,130 conversions, targeted outreach to 2,000 high-propensity customers could yield similar conversions at $10K—80% cost reduction.

2Solution

Built three classification models on 11,162 customer records with demographic, financial, and campaign attributes. Applied sophisticated feature engineering and class balancing techniques to handle the 11.3% subscription imbalance.

  • XGBoost with scale_pos_weight for imbalanced data
  • Engineered long_call, age_group, previous_success features
  • Stratified sampling to preserve class distribution

3Impact

  • 93% AUC with XGBoost (best model)
  • Top 20% targeting yields 3x higher conversion
  • Feature engineering improved AUC from 0.89 → 0.93
  • Identified call duration and previous success as key drivers

Feature Engineering: Before vs. After

Strategic feature engineering improved model performance across all algorithms. Here's what we built:

long_call (Binary Indicator)

Flags calls exceeding 75th percentile (~496 seconds). Based on engagement theory: longer calls suggest higher interest and persuasion, correlating with conversion.

long_call = 1 if duration > 496 else 0
Impact: +2% AUC improvement

age_group (Categorical)

Segments customers into Young (<30), Middle-aged (30-60), Seniors (>60). Captures non-linear life-stage patterns in financial decision-making.

age_group = 'young' if age < 30 else ('middle' if age <= 60 else 'senior')
Impact: Improved interpretability + 1.5% AUC

previous_success (Binary)

Indicates whether customer responded positively in past campaigns. Brand familiarity strongly predicts future conversion.

previous_success = 1 if (pdays != -1 and poutcome == 'success') else 0
Impact: Top 3 most important feature

One-Hot Encoding (Categorical)

Converted month, contact method, and other categoricals into binary dummies (month_mar, contact_cellular). Enables algorithm compatibility.

pd.get_dummies(df, columns=['month', 'contact'])
Impact: Required for model inputs

AUC Improvement Across Models

Logistic Regression
0.83
Before
0.85
After
+2.4% lift
Random Forest
0.87
Before
0.91
After
+4.6% lift
XGBoost
0.89
Before
0.93
After
+4.5% lift

Technical Implementation

XGBoost with Class Imbalance Handling

Configured XGBoost with scale_pos_weight to handle the 11.3% subscription rate and prevent model bias toward majority class.

from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, classification_report

# Calculate class imbalance ratio
neg_count = (y_train == 0).sum()
pos_count = (y_train == 1).sum()
scale_pos_weight = neg_count / pos_count  # ~8:1 ratio

# Train XGBoost with imbalance handling
xgb_model = XGBClassifier(
    n_estimators=100,
    max_depth=6,
    learning_rate=0.1,
    scale_pos_weight=scale_pos_weight,  # Key parameter for imbalance
    random_state=42,
    eval_metric='auc'
)

xgb_model.fit(
    X_train, y_train,
    eval_set=[(X_test, y_test)],
    early_stopping_rounds=10,
    verbose=False
)

# Predictions
y_pred_proba = xgb_model.predict_proba(X_test)[:, 1]
auc_score = roc_auc_score(y_test, y_pred_proba)

print(f"XGBoost AUC: {auc_score:.3f}")  # Output: 0.930

Feature Engineering Pipeline

Created domain-informed features that captured behavioral patterns and temporal signals in customer engagement.

import pandas as pd
import numpy as np

def engineer_features(df):
    """
    Transform raw features into predictive signals
    """
    # Long call indicator (engagement depth)
    duration_threshold = df['duration'].quantile(0.75)
    df['long_call'] = (df['duration'] > duration_threshold).astype(int)
    
    # Age groups (life stage patterns)
    df['age_group'] = pd.cut(
        df['age'],
        bins=[0, 30, 60, 100],
        labels=['young', 'middle', 'senior']
    )
    
    # Previous success indicator (brand familiarity)
    df['previous_success'] = (
        (df['pdays'] != -1) & (df['poutcome'] == 'success')
    ).astype(int)
    
    # Was contacted before
    df['was_contacted'] = (df['pdays'] != -1).astype(int)
    
    # One-hot encode categoricals
    df = pd.get_dummies(
        df,
        columns=['job', 'marital', 'education', 'month', 'contact'],
        drop_first=True
    )
    
    return df

# Apply feature engineering
df_train = engineer_features(train_data)
df_test = engineer_features(test_data)

Model Evaluation with Stratified Sampling

Used stratified train-test split to maintain class distribution and prevent sampling bias in evaluation.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Stratified split (preserves 11.3% subscription rate in both sets)
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    stratify=y,  # Critical for imbalanced data
    random_state=42
)

# Standardize numerical features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Verify class distribution
print(f"Train subscription rate: {y_train.mean():.3f}")
print(f"Test subscription rate: {y_test.mean():.3f}")
# Both should be ~0.113

Business Impact: Targeted vs. Random Outreach

❌ Random Outreach

Customers Contacted:10,000
Expected Conversions:1,130
Conversion Rate:11.3%
Cost (@ $5/contact):$50,000
Cost per Conversion:$44.25

✅ Model-Driven Targeting

Customers Contacted:2,000 (top 20%)
Expected Conversions:~1,200
Conversion Rate:~60%
Cost (@ $5/contact):$10,000
Cost per Conversion:$8.33

Bottom Line: Model-driven targeting reduces cost per conversion by 81% ($44.25 → $8.33) while maintaining similar total conversions with 80% fewer contacts.

Lessons Learned

Call Duration Was Deceptively Important: Initially thought demographic factors (age, job, education) would dominate. Feature importance revealed call duration as the strongest predictor. This taught me that behavioral signals often outperform static attributes in predictive power.

Class Imbalance Requires More Than SMOTE: Tried SMOTE (synthetic minority oversampling) but found scale_pos_weight in XGBoost more effective. Lesson: For tree-based models, weighting beats resampling because it preserves the true data distribution.

Feature Engineering Beat Algorithm Selection: XGBoost went from 0.89 → 0.93 AUC with engineered features. The same features improved Logistic Regression from 0.83 → 0.85. Spending time on features delivers more value than tuning hyperparameters.

Previous Campaign Success is Gold: Customers who said "yes" before are 10x more likely to convert again. This validates the marketing wisdom: "your best leads are people who already know you." Banks should maintain detailed campaign response history.

Deployment Strategy for Banks

Phase 1: Pilot (30 days)

  • • Deploy model to score next month's campaign prospects
  • • Target top 20% with term deposit offers
  • • Track conversion rate vs. control group
  • • Measure cost savings and ROI lift

Phase 2: Production (90 days)

  • • Integrate model API into CRM system
  • • Build real-time scoring dashboard for campaign managers
  • • A/B test different propensity score thresholds
  • • Expand to other products (credit cards, loans, investment accounts)