Business Failure Predictor

Machine Learning Model for Bankruptcy Risk Assessment

78,682 Companies Analyzed 82.6% AUC Score Random Forest ML

Project Overview

A machine learning model that predicts company bankruptcy risk by analyzing 18 financial metrics. Trained on nearly 80,000 American companies spanning 20 years (1999-2018), the model identifies patterns that signal business failure.

Using Random Forest classification, the model achieves 82.6% AUC score and 92% accuracy, making it a reliable tool for assessing financial distress in businesses.

Python Scikit-Learn Random Forest Pandas
82.6%
AUC Score - Strong Predictive Power
92%
Overall Accuracy
78,682
Companies in Dataset
18
Financial Ratios Analyzed

Model Performance

Feature Importance

Feature Importance

Analysis reveals which financial metrics are most predictive of business failure. X8, X6, and X15 emerge as the top three indicators, representing critical financial ratios that signal distress.

Note: The dataset uses anonymized financial ratios (X1-X18) which likely include debt-to-equity ratios, profitability metrics, liquidity ratios, and asset turnover measures commonly used in bankruptcy prediction models.

ROC Curve

ROC Curve

The ROC curve demonstrates the model's ability to distinguish between healthy and failing companies. With an AUC of 0.8262, the model significantly outperforms random guessing and shows strong discriminative power across different probability thresholds.

Confusion Matrix

Confusion Matrix

The confusion matrix breaks down prediction accuracy. The model correctly identifies 14,079 healthy companies and 330 failing companies, with relatively low false positive and false negative rates.

Risk Score Distribution

Risk Distribution

Distribution of predicted failure probabilities shows clear separation between actually healthy (green) and actually failed (orange) companies, demonstrating the model's effectiveness at risk stratification.

Key Insights

Technical Implementation

The project demonstrates end-to-end machine learning workflow including data preprocessing, feature scaling, model training, hyperparameter tuning, and comprehensive evaluation.

Key techniques: