Email Spam Detector | Monse Rojo

Project Overview

An NLP-powered email spam detector that achieves perfect classification using machine learning. Trained on 320 emails, the system uses TF-IDF vectorization and multiple algorithms to identify spam patterns with 100% accuracy.

Built using Python, scikit-learn, and natural language processing techniques, this project demonstrates text preprocessing, feature extraction, and multi-model comparison.

Top Spam Word Indicators

Top 15 words that appear most frequently in spam emails. "Free", "cash", and "prizes" are the strongest indicators of spam content.

Model Performance Comparison

All three models (Naive Bayes, Logistic Regression, Random Forest) achieved perfect 100% accuracy on the test set.

Confusion Matrix

Perfect classification with zero false positives or false negatives on 64 test emails.

Using the Model

The trained model is saved and ready to classify new emails. The spam detector can be used to automatically filter incoming messages.

Quick Start:

from predict import classify_email

email = "Congratulations! You won $1000!"
result = classify_email(email)
# Output: SPAM (99% confidence)

Example Classifications:

✓ HAM (Legitimate Email)

"Meeting at 2pm tomorrow in conference room B"

Confidence: 98%

✗ SPAM (Junk Email)

"Click here for FREE prizes and cash instantly!"

Confidence: 99%

✓ HAM (Legitimate Email)

"Thank you for your support. Let me know if you need anything."

Confidence: 97%