Machine Learning Model for Bankruptcy Risk Assessment
A machine learning model that predicts company bankruptcy risk by analyzing 18 financial metrics. Trained on nearly 80,000 American companies spanning 20 years (1999-2018), the model identifies patterns that signal business failure.
Using Random Forest classification, the model achieves 82.6% AUC score and 92% accuracy, making it a reliable tool for assessing financial distress in businesses.
Analysis reveals which financial metrics are most predictive of business failure. X8, X6, and X15 emerge as the top three indicators, representing critical financial ratios that signal distress.
Note: The dataset uses anonymized financial ratios (X1-X18) which likely include debt-to-equity ratios, profitability metrics, liquidity ratios, and asset turnover measures commonly used in bankruptcy prediction models.
The ROC curve demonstrates the model's ability to distinguish between healthy and failing companies. With an AUC of 0.8262, the model significantly outperforms random guessing and shows strong discriminative power across different probability thresholds.
The confusion matrix breaks down prediction accuracy. The model correctly identifies 14,079 healthy companies and 330 failing companies, with relatively low false positive and false negative rates.
Distribution of predicted failure probabilities shows clear separation between actually healthy (green) and actually failed (orange) companies, demonstrating the model's effectiveness at risk stratification.
The project demonstrates end-to-end machine learning workflow including data preprocessing, feature scaling, model training, hyperparameter tuning, and comprehensive evaluation.
Key techniques: