Propensity Modeling for Bank Marketing
ML-driven campaign targeting achieving 93% AUC—increasing conversion rates 3x through precision customer selection.
Academic Project: ALY6041 - Python & Analytics System Technology | Northeastern University | Dr. Shaiyan Keshvari | June 2025
Model Performance Comparison
Evaluated three classification models—XGBoost emerged as the winner with superior AUC and balanced precision-recall.
ROC Curve Analysis

XGBoost (AUC=0.93) outperformed Random Forest (0.92) and Logistic Regression (0.91)
XGBoost Confusion Matrix

Well-calibrated model accurately identifying both subscribers and non-subscribers
Top Predictive Features

Previous campaign success, contact method uncertainty, and March timing were strongest predictors
1Problem
Direct marketing campaigns have notoriously low conversion rates (11.3% in this dataset). Banks waste millions on broad, untargeted outreach, causing both inefficiency and customer fatigue from unwanted calls.
2Solution
Built three classification models on 11,162 customer records with demographic, financial, and campaign attributes. Applied sophisticated feature engineering and class balancing techniques to handle the 11.3% subscription imbalance.
- →XGBoost with scale_pos_weight for imbalanced data
- →Engineered long_call, age_group, previous_success features
- →Stratified sampling to preserve class distribution
3Impact
- ✓93% AUC with XGBoost (best model)
- ✓Top 20% targeting yields 3x higher conversion
- ✓Feature engineering improved AUC from 0.89 → 0.93
- ✓Identified call duration and previous success as key drivers
Feature Engineering: Before vs. After
Strategic feature engineering improved model performance across all algorithms. Here's what we built:
long_call (Binary Indicator)
Flags calls exceeding 75th percentile (~496 seconds). Based on engagement theory: longer calls suggest higher interest and persuasion, correlating with conversion.
long_call = 1 if duration > 496 else 0age_group (Categorical)
Segments customers into Young (<30), Middle-aged (30-60), Seniors (>60). Captures non-linear life-stage patterns in financial decision-making.
age_group = 'young' if age < 30 else ('middle' if age <= 60 else 'senior')previous_success (Binary)
Indicates whether customer responded positively in past campaigns. Brand familiarity strongly predicts future conversion.
previous_success = 1 if (pdays != -1 and poutcome == 'success') else 0One-Hot Encoding (Categorical)
Converted month, contact method, and other categoricals into binary dummies (month_mar, contact_cellular). Enables algorithm compatibility.
pd.get_dummies(df, columns=['month', 'contact'])AUC Improvement Across Models
Technical Implementation
XGBoost with Class Imbalance Handling
Configured XGBoost with scale_pos_weight to handle the 11.3% subscription rate and prevent model bias toward majority class.
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, classification_report
# Calculate class imbalance ratio
neg_count = (y_train == 0).sum()
pos_count = (y_train == 1).sum()
scale_pos_weight = neg_count / pos_count # ~8:1 ratio
# Train XGBoost with imbalance handling
xgb_model = XGBClassifier(
n_estimators=100,
max_depth=6,
learning_rate=0.1,
scale_pos_weight=scale_pos_weight, # Key parameter for imbalance
random_state=42,
eval_metric='auc'
)
xgb_model.fit(
X_train, y_train,
eval_set=[(X_test, y_test)],
early_stopping_rounds=10,
verbose=False
)
# Predictions
y_pred_proba = xgb_model.predict_proba(X_test)[:, 1]
auc_score = roc_auc_score(y_test, y_pred_proba)
print(f"XGBoost AUC: {auc_score:.3f}") # Output: 0.930Feature Engineering Pipeline
Created domain-informed features that captured behavioral patterns and temporal signals in customer engagement.
import pandas as pd
import numpy as np
def engineer_features(df):
"""
Transform raw features into predictive signals
"""
# Long call indicator (engagement depth)
duration_threshold = df['duration'].quantile(0.75)
df['long_call'] = (df['duration'] > duration_threshold).astype(int)
# Age groups (life stage patterns)
df['age_group'] = pd.cut(
df['age'],
bins=[0, 30, 60, 100],
labels=['young', 'middle', 'senior']
)
# Previous success indicator (brand familiarity)
df['previous_success'] = (
(df['pdays'] != -1) & (df['poutcome'] == 'success')
).astype(int)
# Was contacted before
df['was_contacted'] = (df['pdays'] != -1).astype(int)
# One-hot encode categoricals
df = pd.get_dummies(
df,
columns=['job', 'marital', 'education', 'month', 'contact'],
drop_first=True
)
return df
# Apply feature engineering
df_train = engineer_features(train_data)
df_test = engineer_features(test_data)Model Evaluation with Stratified Sampling
Used stratified train-test split to maintain class distribution and prevent sampling bias in evaluation.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Stratified split (preserves 11.3% subscription rate in both sets)
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2,
stratify=y, # Critical for imbalanced data
random_state=42
)
# Standardize numerical features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Verify class distribution
print(f"Train subscription rate: {y_train.mean():.3f}")
print(f"Test subscription rate: {y_test.mean():.3f}")
# Both should be ~0.113Business Impact: Targeted vs. Random Outreach
❌ Random Outreach
✅ Model-Driven Targeting
Bottom Line: Model-driven targeting reduces cost per conversion by 81% ($44.25 → $8.33) while maintaining similar total conversions with 80% fewer contacts.
Lessons Learned
Call Duration Was Deceptively Important: Initially thought demographic factors (age, job, education) would dominate. Feature importance revealed call duration as the strongest predictor. This taught me that behavioral signals often outperform static attributes in predictive power.
Class Imbalance Requires More Than SMOTE: Tried SMOTE (synthetic minority oversampling) but found scale_pos_weight in XGBoost more effective. Lesson: For tree-based models, weighting beats resampling because it preserves the true data distribution.
Feature Engineering Beat Algorithm Selection: XGBoost went from 0.89 → 0.93 AUC with engineered features. The same features improved Logistic Regression from 0.83 → 0.85. Spending time on features delivers more value than tuning hyperparameters.
Previous Campaign Success is Gold: Customers who said "yes" before are 10x more likely to convert again. This validates the marketing wisdom: "your best leads are people who already know you." Banks should maintain detailed campaign response history.
Deployment Strategy for Banks
Phase 1: Pilot (30 days)
- • Deploy model to score next month's campaign prospects
- • Target top 20% with term deposit offers
- • Track conversion rate vs. control group
- • Measure cost savings and ROI lift
Phase 2: Production (90 days)
- • Integrate model API into CRM system
- • Build real-time scoring dashboard for campaign managers
- • A/B test different propensity score thresholds
- • Expand to other products (credit cards, loans, investment accounts)