Email Spam Detector

NLP-Powered Classification with 100% Accuracy

320 Emails Analyzed 100% Accuracy TF-IDF + ML

Project Overview

An NLP-powered email spam detector that achieves perfect classification using machine learning. Trained on 320 emails, the system uses TF-IDF vectorization and multiple algorithms to identify spam patterns with 100% accuracy.

Built using Python, scikit-learn, and natural language processing techniques, this project demonstrates text preprocessing, feature extraction, and multi-model comparison.

100%
Classification Accuracy
320
Emails Analyzed
3
ML Models Compared
45
TF-IDF Features

Top Spam Word Indicators

Spam Words

Top 15 words that appear most frequently in spam emails. "Free", "cash", and "prizes" are the strongest indicators of spam content.

Model Performance Comparison

Model Comparison

All three models (Naive Bayes, Logistic Regression, Random Forest) achieved perfect 100% accuracy on the test set.

Confusion Matrix

Confusion Matrix

Perfect classification with zero false positives or false negatives on 64 test emails.

Using the Model

The trained model is saved and ready to classify new emails. The spam detector can be used to automatically filter incoming messages.

Quick Start:

from predict import classify_email email = "Congratulations! You won $1000!" result = classify_email(email) # Output: SPAM (99% confidence)

Example Classifications:

✓ HAM (Legitimate Email)
"Meeting at 2pm tomorrow in conference room B"
Confidence: 98%
✗ SPAM (Junk Email)
"Click here for FREE prizes and cash instantly!"
Confidence: 99%
✓ HAM (Legitimate Email)
"Thank you for your support. Let me know if you need anything."
Confidence: 97%
✗ SPAM (Junk Email)
"You have been selected to win a gift card! Reply now!"
Confidence: 99%

Key Features: