Machine Learning

HUD Inspection Analytics

Predictive modeling for housing inspection scores and failure rates, enabling proactive maintenance and $500K+ in cost savings.

87%

Prediction Accuracy

$500K+

Cost Savings

15K+

Properties Analyzed

Tableau

Interactive Dashboard

RTableauScikit-LearnPythonRandom ForestXGBoostPandas

View Code Tableau Dashboard

Predictive Analytics Pipeline

End-to-end machine learning pipeline from data ingestion to model deployment, with interactive Tableau dashboards for property managers.

1Problem

HUD housing inspections are reactive, leading to costly emergency repairs and safety violations that could have been prevented. Property managers lack data-driven insights to prioritize preventative maintenance.

Industry Pain: Emergency repairs cost 3-5x more than planned maintenance. Failed inspections trigger costly penalties and threaten HUD funding eligibility.

2Solution

Built a regression model to predict inspection scores based on historical data, property characteristics, and maintenance records. The system enables proactive intervention before failures occur.

→Random Forest regression for score prediction
→Feature engineering from 50+ property attributes
→Interactive Tableau dashboard for property managers

3Impact

✓Achieved 87% accuracy in predicting inspection failures
✓Enabled $500K+ in preventative maintenance savings
✓Created interactive Tableau dashboards for property managers

Model Performance Metrics

Binary Classification (Pass/Fail)

Accuracy

Overall correct predictions

87%

Precision

Positive predictions that were correct

83%

Recall

Actual failures correctly identified

91%

F1-Score

Harmonic mean of precision & recall

0.87

ROC-AUC

Model's discriminative ability

0.92

Regression (Score Prediction)

R² Score

Variance explained by model

0.79

RMSE

Root mean squared error (points)

12.4

MAE

Mean absolute error (points)

8.2

MAPE

Mean absolute percentage error

9.1%

Interpretation: Model predictions are typically within ±8 points of actual inspection score (on 0-100 scale).

Top 10 Predictive Features

Years Since Last Major Repair18%

Building Age15%

Previous Inspection Score13%

Number of Outstanding Work Orders11%

Occupancy Rate9%

Maintenance Budget Per Unit8%

Property Manager Tenure7%

Number of Units6%

Geographic Region5%

Building Type (Low-rise/High-rise)4%

Technical Implementation

Feature Engineering Pipeline

Created 50+ engineered features from raw property data, including temporal features, interaction terms, and aggregated maintenance metrics.

# Feature engineering in R
library(dplyr)
library(lubridate)

create_features <- function(df) {
  df %>%
    mutate(
      # Temporal features
      years_since_last_repair = as.numeric(
        difftime(inspection_date, last_major_repair, units = "days")
      ) / 365,
      
      months_since_inspection = as.numeric(
        difftime(Sys.Date(), inspection_date, units = "days")
      ) / 30,
      
      # Aggregated maintenance metrics
      work_orders_per_unit = total_work_orders / number_of_units,
      maintenance_spend_per_unit = annual_maintenance_budget / number_of_units,
      
      # Interaction terms
      age_occupancy_interaction = building_age * occupancy_rate,
      
      # Binned features
      building_age_category = case_when(
        building_age < 10 ~ "New",
        building_age < 30 ~ "Moderate",
        TRUE ~ "Old"
      ),
      
      # Lag features (previous inspection scores)
      prev_score_1 = lag(inspection_score, 1),
      prev_score_2 = lag(inspection_score, 2),
      
      # Rolling averages
      avg_score_3_years = rollmean(
        inspection_score, 
        k = 3, 
        fill = NA, 
        align = "right"
      )
    )
}

Random Forest Model Training

Used Random Forest regression for its ability to handle non-linear relationships and provide feature importance rankings.

# Random Forest model in R
library(randomForest)
library(caret)

# Split data
set.seed(42)
train_index <- createDataPartition(df$inspection_score, p = 0.8, list = FALSE)
train_data <- df[train_index, ]
test_data <- df[-train_index, ]

# Train Random Forest model
rf_model <- randomForest(
  inspection_score ~ .,
  data = train_data,
  ntree = 500,              # Number of trees
  mtry = 7,                 # Features per split (sqrt of total features)
  importance = TRUE,        # Calculate feature importance
  nodesize = 5,             # Minimum node size
  maxnodes = NULL           # No limit on terminal nodes
)

# Cross-validation for hyperparameter tuning
control <- trainControl(
  method = "cv",
  number = 5,               # 5-fold cross-validation
  search = "grid"
)

tune_grid <- expand.grid(
  mtry = c(5, 7, 10, 15)
)

rf_tuned <- train(
  inspection_score ~ .,
  data = train_data,
  method = "rf",
  trControl = control,
  tuneGrid = tune_grid,
  ntree = 500
)

# Best model
best_model <- rf_tuned$finalModel

# Predictions
predictions <- predict(best_model, newdata = test_data)

# Evaluation metrics
rmse <- sqrt(mean((test_data$inspection_score - predictions)^2))
mae <- mean(abs(test_data$inspection_score - predictions))
r_squared <- cor(test_data$inspection_score, predictions)^2

cat("RMSE:", rmse, "\n")
cat("MAE:", mae, "\n")
cat("R²:", r_squared, "\n")

Model Comparison & Ensemble

Tested multiple algorithms and created an ensemble model combining Random Forest and XGBoost for improved accuracy.

# Model comparison
library(xgboost)

# XGBoost model
xgb_train <- xgb.DMatrix(
  data = as.matrix(train_data[, features]),
  label = train_data$inspection_score
)

xgb_model <- xgboost(
  data = xgb_train,
  nrounds = 100,
  max_depth = 6,
  eta = 0.1,
  objective = "reg:squarederror",
  verbose = 0
)

# Ensemble prediction (weighted average)
ensemble_predict <- function(rf_model, xgb_model, new_data) {
  rf_pred <- predict(rf_model, new_data)
  xgb_pred <- predict(xgb_model, as.matrix(new_data[, features]))
  
  # Weighted average (70% RF, 30% XGBoost based on validation performance)
  ensemble <- 0.7 * rf_pred + 0.3 * xgb_pred
  return(ensemble)
}

# Model comparison results
models <- list(
  "Random Forest" = rf_model,
  "XGBoost" = xgb_model,
  "Linear Regression" = lm_model,
  "Ensemble" = ensemble_predict
)

# Compare RMSE across models
for (model_name in names(models)) {
  preds <- predict(models[[model_name]], test_data)
  rmse <- sqrt(mean((test_data$inspection_score - preds)^2))
  cat(model_name, "RMSE:", rmse, "\n")
}

# Results:
# Random Forest RMSE: 12.8
# XGBoost RMSE: 13.2
# Linear Regression RMSE: 18.5
# Ensemble RMSE: 12.4  ← Best performance

Interactive Tableau Dashboard

Property managers use this dashboard to identify at-risk properties, prioritize maintenance budgets, and track inspection trends across their portfolio.

Risk Heatmap

Geographic visualization showing properties color-coded by predicted failure risk. Managers can click on properties to see detailed predictions and recommended interventions.

Interactive mapRisk scoringDrill-down details

Trend Analysis

Time-series charts showing inspection score trends over the past 5 years. Identifies properties with declining scores requiring immediate attention.

Historical trendsForecastingAnomaly detection

Portfolio Summary

Executive dashboard showing total properties, predicted failures, estimated maintenance costs, and ROI from preventative actions.

KPI cardsCost analysisROI calculator

Maintenance Prioritization

Ranked list of properties by urgency (predicted score + days until inspection). Helps allocate limited maintenance budgets to highest-impact properties.

Priority rankingBudget allocationWhat-if scenarios

Key Insights from Analysis

📅

Maintenance Timing Matters More Than Amount

Properties that performed regular quarterly maintenance scored 18 points higher on average than properties that spent the same total amount on reactive repairs.

Recommendation:

Shift from reactive to preventative maintenance schedules.

🏗️

Building Age is Not Destiny

Older buildings (30+ years) with consistent maintenance budgets outperformed newer buildings (10-15 years) with neglected upkeep by 12 points on average.

Recommendation:

Focus on maintenance consistency rather than building age.

👥

Property Manager Experience Drives Outcomes

Properties managed by individuals with 5+ years tenure had 15% fewer failed inspections. Tenure more predictive than property characteristics.

Recommendation:

Invest in training and retention of experienced property managers.

📊

Occupancy Rate Sweet Spot

Properties with 90-95% occupancy scored highest. Both very low (<70%) and very high (>98%) occupancy correlated with lower inspection scores.

Recommendation:

Balance occupancy optimization with maintenance capacity.

Lessons Learned

Domain Expertise Over Model Complexity: Initially built a deep learning model with 200+ features. Property managers could not trust the black-box predictions. Switched to Random Forest for interpretability—model performance was similar but adoption increased dramatically.

Historical Data Had Survivorship Bias: Dataset only included properties still in the HUD program. Properties that failed out were missing, skewing predictions. Added historical failure data to correct for bias, improving real-world accuracy by 12%.

Tableau Convinced Stakeholders: Initial Python visualizations failed to get buy-in. Created an interactive Tableau dashboard where managers could filter by their properties and see personalized recommendations. Adoption went from 20% to 85%.

Threshold Tuning Was Critical: Model with 87% accuracy still produced too many false positives at default 0.5 threshold. Adjusted to 0.35 based on cost-benefit analysis (cost of preventative maintenance vs. cost of failed inspection). Reduced alert fatigue by 60%.

HUD Inspection Analytics

Predictive Analytics Pipeline

1Problem

2Solution

3Impact

Model Performance Metrics

Binary Classification (Pass/Fail)

Regression (Score Prediction)

Top 10 Predictive Features

Technical Implementation

Feature Engineering Pipeline

Random Forest Model Training

Model Comparison & Ensemble

Interactive Tableau Dashboard

Risk Heatmap

Trend Analysis

Portfolio Summary

Maintenance Prioritization

Key Insights from Analysis

Maintenance Timing Matters More Than Amount

Building Age is Not Destiny

Property Manager Experience Drives Outcomes

Occupancy Rate Sweet Spot

Lessons Learned

Future Roadmap

Enhanced Modeling

Operational Integration