HUD Inspection Analytics
Predictive modeling for housing inspection scores and failure rates, enabling proactive maintenance and $500K+ in cost savings.
Predictive Analytics Pipeline
End-to-end machine learning pipeline from data ingestion to model deployment, with interactive Tableau dashboards for property managers.
1Problem
HUD housing inspections are reactive, leading to costly emergency repairs and safety violations that could have been prevented. Property managers lack data-driven insights to prioritize preventative maintenance.
2Solution
Built a regression model to predict inspection scores based on historical data, property characteristics, and maintenance records. The system enables proactive intervention before failures occur.
- →Random Forest regression for score prediction
- →Feature engineering from 50+ property attributes
- →Interactive Tableau dashboard for property managers
3Impact
- ✓Achieved 87% accuracy in predicting inspection failures
- ✓Enabled $500K+ in preventative maintenance savings
- ✓Created interactive Tableau dashboards for property managers
Model Performance Metrics
Binary Classification (Pass/Fail)
Regression (Score Prediction)
Top 10 Predictive Features
Technical Implementation
Feature Engineering Pipeline
Created 50+ engineered features from raw property data, including temporal features, interaction terms, and aggregated maintenance metrics.
# Feature engineering in R
library(dplyr)
library(lubridate)
create_features <- function(df) {
df %>%
mutate(
# Temporal features
years_since_last_repair = as.numeric(
difftime(inspection_date, last_major_repair, units = "days")
) / 365,
months_since_inspection = as.numeric(
difftime(Sys.Date(), inspection_date, units = "days")
) / 30,
# Aggregated maintenance metrics
work_orders_per_unit = total_work_orders / number_of_units,
maintenance_spend_per_unit = annual_maintenance_budget / number_of_units,
# Interaction terms
age_occupancy_interaction = building_age * occupancy_rate,
# Binned features
building_age_category = case_when(
building_age < 10 ~ "New",
building_age < 30 ~ "Moderate",
TRUE ~ "Old"
),
# Lag features (previous inspection scores)
prev_score_1 = lag(inspection_score, 1),
prev_score_2 = lag(inspection_score, 2),
# Rolling averages
avg_score_3_years = rollmean(
inspection_score,
k = 3,
fill = NA,
align = "right"
)
)
}Random Forest Model Training
Used Random Forest regression for its ability to handle non-linear relationships and provide feature importance rankings.
# Random Forest model in R
library(randomForest)
library(caret)
# Split data
set.seed(42)
train_index <- createDataPartition(df$inspection_score, p = 0.8, list = FALSE)
train_data <- df[train_index, ]
test_data <- df[-train_index, ]
# Train Random Forest model
rf_model <- randomForest(
inspection_score ~ .,
data = train_data,
ntree = 500, # Number of trees
mtry = 7, # Features per split (sqrt of total features)
importance = TRUE, # Calculate feature importance
nodesize = 5, # Minimum node size
maxnodes = NULL # No limit on terminal nodes
)
# Cross-validation for hyperparameter tuning
control <- trainControl(
method = "cv",
number = 5, # 5-fold cross-validation
search = "grid"
)
tune_grid <- expand.grid(
mtry = c(5, 7, 10, 15)
)
rf_tuned <- train(
inspection_score ~ .,
data = train_data,
method = "rf",
trControl = control,
tuneGrid = tune_grid,
ntree = 500
)
# Best model
best_model <- rf_tuned$finalModel
# Predictions
predictions <- predict(best_model, newdata = test_data)
# Evaluation metrics
rmse <- sqrt(mean((test_data$inspection_score - predictions)^2))
mae <- mean(abs(test_data$inspection_score - predictions))
r_squared <- cor(test_data$inspection_score, predictions)^2
cat("RMSE:", rmse, "\n")
cat("MAE:", mae, "\n")
cat("R²:", r_squared, "\n")Model Comparison & Ensemble
Tested multiple algorithms and created an ensemble model combining Random Forest and XGBoost for improved accuracy.
# Model comparison
library(xgboost)
# XGBoost model
xgb_train <- xgb.DMatrix(
data = as.matrix(train_data[, features]),
label = train_data$inspection_score
)
xgb_model <- xgboost(
data = xgb_train,
nrounds = 100,
max_depth = 6,
eta = 0.1,
objective = "reg:squarederror",
verbose = 0
)
# Ensemble prediction (weighted average)
ensemble_predict <- function(rf_model, xgb_model, new_data) {
rf_pred <- predict(rf_model, new_data)
xgb_pred <- predict(xgb_model, as.matrix(new_data[, features]))
# Weighted average (70% RF, 30% XGBoost based on validation performance)
ensemble <- 0.7 * rf_pred + 0.3 * xgb_pred
return(ensemble)
}
# Model comparison results
models <- list(
"Random Forest" = rf_model,
"XGBoost" = xgb_model,
"Linear Regression" = lm_model,
"Ensemble" = ensemble_predict
)
# Compare RMSE across models
for (model_name in names(models)) {
preds <- predict(models[[model_name]], test_data)
rmse <- sqrt(mean((test_data$inspection_score - preds)^2))
cat(model_name, "RMSE:", rmse, "\n")
}
# Results:
# Random Forest RMSE: 12.8
# XGBoost RMSE: 13.2
# Linear Regression RMSE: 18.5
# Ensemble RMSE: 12.4 ← Best performanceInteractive Tableau Dashboard
Property managers use this dashboard to identify at-risk properties, prioritize maintenance budgets, and track inspection trends across their portfolio.
Risk Heatmap
Geographic visualization showing properties color-coded by predicted failure risk. Managers can click on properties to see detailed predictions and recommended interventions.
Trend Analysis
Time-series charts showing inspection score trends over the past 5 years. Identifies properties with declining scores requiring immediate attention.
Portfolio Summary
Executive dashboard showing total properties, predicted failures, estimated maintenance costs, and ROI from preventative actions.
Maintenance Prioritization
Ranked list of properties by urgency (predicted score + days until inspection). Helps allocate limited maintenance budgets to highest-impact properties.
Key Insights from Analysis
Maintenance Timing Matters More Than Amount
Properties that performed regular quarterly maintenance scored 18 points higher on average than properties that spent the same total amount on reactive repairs.
Shift from reactive to preventative maintenance schedules.
Building Age is Not Destiny
Older buildings (30+ years) with consistent maintenance budgets outperformed newer buildings (10-15 years) with neglected upkeep by 12 points on average.
Focus on maintenance consistency rather than building age.
Property Manager Experience Drives Outcomes
Properties managed by individuals with 5+ years tenure had 15% fewer failed inspections. Tenure more predictive than property characteristics.
Invest in training and retention of experienced property managers.
Occupancy Rate Sweet Spot
Properties with 90-95% occupancy scored highest. Both very low (<70%) and very high (>98%) occupancy correlated with lower inspection scores.
Balance occupancy optimization with maintenance capacity.
Lessons Learned
Domain Expertise Over Model Complexity: Initially built a deep learning model with 200+ features. Property managers could not trust the black-box predictions. Switched to Random Forest for interpretability—model performance was similar but adoption increased dramatically.
Historical Data Had Survivorship Bias: Dataset only included properties still in the HUD program. Properties that failed out were missing, skewing predictions. Added historical failure data to correct for bias, improving real-world accuracy by 12%.
Tableau Convinced Stakeholders: Initial Python visualizations failed to get buy-in. Created an interactive Tableau dashboard where managers could filter by their properties and see personalized recommendations. Adoption went from 20% to 85%.
Threshold Tuning Was Critical: Model with 87% accuracy still produced too many false positives at default 0.5 threshold. Adjusted to 0.35 based on cost-benefit analysis (cost of preventative maintenance vs. cost of failed inspection). Reduced alert fatigue by 60%.
Future Roadmap
Enhanced Modeling
- • Incorporate weather data (extreme temps correlate with HVAC failures)
- • Add computer vision for property condition assessment
- • Survival analysis for time-to-failure predictions
- • Causal inference to isolate maintenance intervention effects
Operational Integration
- • API integration with property management systems
- • Automated work order generation for predicted failures
- • Mobile app for on-site inspectors
- • Cost optimization algorithm for maintenance budgets