# Twitter/X Trending Hashtags Analysis (2020-2025)

**Author:** Monse Rojo  
**Date:** December 2024  
**Dataset:** 12,036 trending hashtags from Twitter/X (2020-2025)

---

## Executive Summary

This analysis explores trending hashtag patterns on Twitter/X from 2020 to 2025, a period marked by significant global events including the COVID-19 pandemic, political upheavals, and cultural shifts. By analyzing over 12,000 trending entries, we aim to:

- Identify temporal patterns in viral content
- Correlate major world events with trending behavior
- Analyze hashtag performance metrics
- Extract insights for social media strategy

---

## 1. Setup and Data Loading

First, we'll import the necessary libraries and load our dataset.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings

# Configure visualization settings
plt.style.use('dark_background')
sns.set_palette("husl")
warnings.filterwarnings('ignore')

# Set figure size defaults
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 11

print("✅ Libraries loaded successfully")

In [None]:
# Load the dataset
df = pd.read_csv('../data/twitter-trending-hashtags.csv')

print(f"Dataset loaded: {len(df):,} rows")
print(f"Columns: {list(df.columns)}")
print(f"\nFirst few rows:")
df.head(10)

## 2. Data Exploration and Cleaning

Let's examine the structure and quality of our data.

In [None]:
# Basic dataset information
print("Dataset Info:")
print("=" * 50)
df.info()

print("\n" + "=" * 50)
print("\nBasic Statistics:")
print("=" * 50)
df.describe()

In [None]:
# Check for missing values
print("Missing Values:")
print("=" * 50)
missing = df.isnull().sum()
missing_pct = (missing / len(df)) * 100
missing_df = pd.DataFrame({
    'Missing Count': missing,
    'Percentage': missing_pct
})
print(missing_df[missing_df['Missing Count'] > 0])

if missing.sum() == 0:
    print("\n✅ No missing values found!")

In [None]:
# Convert peak_date to datetime
df['peak_date'] = pd.to_datetime(df['peak_date'])

# Create additional time-based features
df['month'] = df['peak_date'].dt.month
df['month_name'] = df['peak_date'].dt.month_name()
df['day_of_week'] = df['peak_date'].dt.day_name()
df['quarter'] = df['peak_date'].dt.quarter

print("✅ Date features created")
print(f"\nDate range: {df['peak_date'].min()} to {df['peak_date'].max()}")
print(f"Total years covered: {df['year'].nunique()}")
print(f"Years: {sorted(df['year'].unique())}")

## 3. Temporal Analysis

### 3.1 Trends by Year

In [None]:
# Count of trending hashtags by year
yearly_counts = df['year'].value_counts().sort_index()

fig, ax = plt.subplots(figsize=(12, 6))
bars = ax.bar(yearly_counts.index, yearly_counts.values, 
              color='#00ff00', edgecolor='white', linewidth=2)

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{int(height):,}',
            ha='center', va='bottom', fontsize=12, fontweight='bold')

ax.set_xlabel('Year', fontsize=14, fontweight='bold')
ax.set_ylabel('Number of Trending Hashtags', fontsize=14, fontweight='bold')
ax.set_title('Trending Hashtags by Year (2020-2025)', 
             fontsize=16, fontweight='bold', pad=20)
ax.grid(axis='y', alpha=0.3, linestyle='--')

plt.tight_layout()
plt.savefig('../visualizations/trends_by_year.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nYearly Distribution:")
print(yearly_counts)

### 3.2 Monthly Patterns

In [None]:
# Monthly trending patterns
monthly_counts = df.groupby('month_name').size().reindex([
    'January', 'February', 'March', 'April', 'May', 'June',
    'July', 'August', 'September', 'October', 'November', 'December'
])

fig, ax = plt.subplots(figsize=(14, 6))
bars = ax.bar(range(12), monthly_counts.values, 
              color='#00ff00', edgecolor='white', linewidth=2)

# Highlight top 3 months
top_3_indices = monthly_counts.nlargest(3).index
for i, month in enumerate(monthly_counts.index):
    if month in top_3_indices:
        bars[i].set_color('#ff00ff')

ax.set_xticks(range(12))
ax.set_xticklabels(monthly_counts.index, rotation=45, ha='right')
ax.set_xlabel('Month', fontsize=14, fontweight='bold')
ax.set_ylabel('Number of Trending Hashtags', fontsize=14, fontweight='bold')
ax.set_title('Trending Patterns by Month', fontsize=16, fontweight='bold', pad=20)
ax.grid(axis='y', alpha=0.3, linestyle='--')

plt.tight_layout()
plt.savefig('../visualizations/trends_by_month.png', dpi=300, bbox_inches='tight')
plt.show()

### 3.3 Day of Week Analysis

In [None]:
# Day of week trending patterns
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
dow_counts = df.groupby('day_of_week').size().reindex(day_order)

fig, ax = plt.subplots(figsize=(12, 6))
bars = ax.bar(range(7), dow_counts.values, 
              color='#00ff00', edgecolor='white', linewidth=2)

# Highlight weekends
bars[5].set_color('#ff6600')  # Saturday
bars[6].set_color('#ff6600')  # Sunday

ax.set_xticks(range(7))
ax.set_xticklabels(day_order, rotation=45, ha='right')
ax.set_xlabel('Day of Week', fontsize=14, fontweight='bold')
ax.set_ylabel('Number of Trending Hashtags', fontsize=14, fontweight='bold')
ax.set_title('Trending Activity by Day of Week', fontsize=16, fontweight='bold', pad=20)
ax.grid(axis='y', alpha=0.3, linestyle='--')

plt.tight_layout()
plt.savefig('../visualizations/trends_by_day.png', dpi=300, bbox_inches='tight')
plt.show()

## 4. Tweet Volume Analysis

Analyzing the distribution and patterns of tweet volumes.

In [None]:
# Tweet volume statistics
print("Tweet Volume Statistics:")
print("=" * 50)
print(f"Mean: {df['tweets'].mean():,.0f}")
print(f"Median: {df['tweets'].median():,.0f}")
print(f"Max: {df['tweets'].max():,.0f}")
print(f"Min: {df['tweets'].min():,.0f}")
print(f"Std Dev: {df['tweets'].std():,.0f}")

# Top 10 most tweeted hashtags
print("\n" + "=" * 50)
print("Top 10 Most Tweeted Hashtags:")
print("=" * 50)
top_10 = df.nlargest(10, 'tweets')[['tag', 'tweets', 'peak_date', 'year']]
print(top_10.to_string(index=False))

In [None]:
# Distribution of tweet volumes
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Histogram
ax1.hist(df['tweets'], bins=50, color='#00ff00', edgecolor='white', alpha=0.7)
ax1.axvline(df['tweets'].mean(), color='#ff00ff', linestyle='--', 
            linewidth=2, label=f'Mean: {df["tweets"].mean():,.0f}')
ax1.axvline(df['tweets'].median(), color='#00ffff', linestyle='--', 
            linewidth=2, label=f'Median: {df["tweets"].median():,.0f}')
ax1.set_xlabel('Number of Tweets', fontsize=12, fontweight='bold')
ax1.set_ylabel('Frequency', fontsize=12, fontweight='bold')
ax1.set_title('Distribution of Tweet Volumes', fontsize=14, fontweight='bold')
ax1.legend()
ax1.grid(alpha=0.3)

# Box plot by year
df.boxplot(column='tweets', by='year', ax=ax2, patch_artist=True)
ax2.set_xlabel('Year', fontsize=12, fontweight='bold')
ax2.set_ylabel('Number of Tweets', fontsize=12, fontweight='bold')
ax2.set_title('Tweet Volume Distribution by Year', fontsize=14, fontweight='bold')
plt.suptitle('')  # Remove default title

plt.tight_layout()
plt.savefig('../visualizations/tweet_volume_distribution.png', dpi=300, bbox_inches='tight')
plt.show()

## 5. Top Trending Hashtags Analysis

In [None]:
# Top 20 hashtags overall
top_20 = df.nlargest(20, 'tweets')

fig, ax = plt.subplots(figsize=(12, 10))
bars = ax.barh(range(20), top_20['tweets'].values, color='#00ff00', edgecolor='white', linewidth=2)

# Color code by year
colors = {'2020': '#ff0000', '2021': '#ff6600', '2022': '#ffff00', 
          '2023': '#00ff00', '2024': '#00ffff', '2025': '#ff00ff'}
for i, (idx, row) in enumerate(top_20.iterrows()):
    bars[i].set_color(colors.get(str(row['year']), '#00ff00'))

ax.set_yticks(range(20))
ax.set_yticklabels(top_20['tag'].values)
ax.invert_yaxis()
ax.set_xlabel('Number of Tweets', fontsize=14, fontweight='bold')
ax.set_title('Top 20 Trending Hashtags (2020-2025)', fontsize=16, fontweight='bold', pad=20)
ax.grid(axis='x', alpha=0.3, linestyle='--')

# Add value labels
for i, v in enumerate(top_20['tweets'].values):
    ax.text(v, i, f' {v:,.0f}', va='center', fontsize=10)

plt.tight_layout()
plt.savefig('../visualizations/top_20_hashtags.png', dpi=300, bbox_inches='tight')
plt.show()

In [None]:
# Top hashtags by year
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()

years = sorted(df['year'].unique())
for i, year in enumerate(years):
    year_data = df[df['year'] == year].nlargest(10, 'tweets')
    
    axes[i].barh(range(10), year_data['tweets'].values, 
                 color='#00ff00', edgecolor='white', linewidth=1.5)
    axes[i].set_yticks(range(10))
    axes[i].set_yticklabels(year_data['tag'].values, fontsize=9)
    axes[i].invert_yaxis()
    axes[i].set_title(f'Top 10 Trends in {year}', fontsize=12, fontweight='bold')
    axes[i].grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.savefig('../visualizations/top_trends_by_year.png', dpi=300, bbox_inches='tight')
plt.show()

## 6. Rank Analysis

Analyzing the ranking positions of trending hashtags.

In [None]:
# Rank distribution
print("Rank Statistics:")
print("=" * 50)
print(f"Highest Rank (Best): {df['rank'].min()}")
print(f"Lowest Rank: {df['rank'].max()}")
print(f"Mean Rank: {df['rank'].mean():.1f}")
print(f"Median Rank: {df['rank'].median():.1f}")

# Distribution of ranks
fig, ax = plt.subplots(figsize=(12, 6))
ax.hist(df['rank'], bins=50, color='#00ff00', edgecolor='white', alpha=0.7)
ax.set_xlabel('Rank Position', fontsize=12, fontweight='bold')
ax.set_ylabel('Frequency', fontsize=12, fontweight='bold')
ax.set_title('Distribution of Trending Rank Positions', fontsize=14, fontweight='bold')
ax.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('../visualizations/rank_distribution.png', dpi=300, bbox_inches='tight')
plt.show()

## 7. Correlation Analysis

Examining relationships between variables.

In [None]:
# Correlation between rank and tweet volume
correlation = df['rank'].corr(df['tweets'])
print(f"Correlation between Rank and Tweet Volume: {correlation:.3f}")

fig, ax = plt.subplots(figsize=(12, 8))
scatter = ax.scatter(df['rank'], df['tweets'], alpha=0.5, 
                     c=df['year'], cmap='viridis', s=50)
ax.set_xlabel('Rank Position (Lower = Better)', fontsize=12, fontweight='bold')
ax.set_ylabel('Number of Tweets', fontsize=12, fontweight='bold')
ax.set_title('Relationship Between Rank and Tweet Volume', 
             fontsize=14, fontweight='bold')
ax.grid(alpha=0.3)

plt.colorbar(scatter, label='Year')
plt.tight_layout()
plt.savefig('../visualizations/rank_tweet_correlation.png', dpi=300, bbox_inches='tight')
plt.show()

## 8. Heatmap: Trends Over Time

In [None]:
# Create heatmap of trending activity by year and month
heatmap_data = df.groupby(['year', 'month']).size().unstack(fill_value=0)

fig, ax = plt.subplots(figsize=(14, 6))
sns.heatmap(heatmap_data, annot=True, fmt='d', cmap='YlGnBu', 
            linewidths=0.5, ax=ax, cbar_kws={'label': 'Number of Trends'})
ax.set_xlabel('Month', fontsize=12, fontweight='bold')
ax.set_ylabel('Year', fontsize=12, fontweight='bold')
ax.set_title('Trending Activity Heatmap (Year vs Month)', 
             fontsize=14, fontweight='bold', pad=20)

# Set month labels
month_labels = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax.set_xticklabels(month_labels)

plt.tight_layout()
plt.savefig('../visualizations/trends_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()

## 9. Major Events Analysis

Identifying hashtags related to major world events.

In [None]:
# Define keywords for major event categories
event_keywords = {
    'Politics': ['trump', 'biden', 'election', 'vote', 'congress', 'president'],
    'COVID-19': ['covid', 'corona', 'pandemic', 'vaccine', 'lockdown'],
    'International': ['ukraine', 'russia', 'china', 'israel', 'gaza', 'iran'],
    'Entertainment': ['oscars', 'grammy', 'super bowl', 'world cup', 'olympics'],
    'Technology': ['ai', 'chatgpt', 'elon', 'meta', 'twitter', 'tesla'],
    'Social': ['blm', 'metoo', 'pride', 'equality', 'justice']
}

# Categorize hashtags
def categorize_tag(tag):
    tag_lower = str(tag).lower()
    for category, keywords in event_keywords.items():
        if any(keyword in tag_lower for keyword in keywords):
            return category
    return 'Other'

df['category'] = df['tag'].apply(categorize_tag)

# Count by category
category_counts = df['category'].value_counts()
print("\nHashtag Categories:")
print(category_counts)

In [None]:
# Visualize categories
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Bar chart
colors_cat = ['#ff0000' if cat == 'Other' else '#00ff00' for cat in category_counts.index]
ax1.bar(range(len(category_counts)), category_counts.values, 
        color=colors_cat, edgecolor='white', linewidth=2)
ax1.set_xticks(range(len(category_counts)))
ax1.set_xticklabels(category_counts.index, rotation=45, ha='right')
ax1.set_ylabel('Count', fontsize=12, fontweight='bold')
ax1.set_title('Hashtag Distribution by Category', fontsize=14, fontweight='bold')
ax1.grid(axis='y', alpha=0.3)

# Pie chart (excluding 'Other')
category_counts_filtered = category_counts[category_counts.index != 'Other']
ax2.pie(category_counts_filtered.values, labels=category_counts_filtered.index,
        autopct='%1.1f%%', startangle=90)
ax2.set_title('Category Distribution (Excluding "Other")', 
              fontsize=14, fontweight='bold')

plt.tight_layout()
plt.savefig('../visualizations/category_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

## 10. Key Insights and Conclusions

### Summary of Findings

Based on our analysis of 12,036 trending hashtags from 2020-2025, we've discovered:

1. **Temporal Patterns:**
   - Trending activity varies significantly by year, reflecting global events
   - Certain months show higher trending activity (identified in monthly analysis)
   - Weekend vs. weekday patterns reveal user behavior differences

2. **Content Themes:**
   - Political events dominate trending conversations
   - Major global crises (COVID-19, international conflicts) generate sustained engagement
   - Entertainment and technology topics maintain consistent presence

3. **Engagement Metrics:**
   - Tweet volumes vary widely, from thousands to millions
   - Strong correlation between rank position and tweet volume
   - Top trending hashtags often coincide with breaking news events

4. **Strategic Implications:**
   - Timing matters: certain periods show higher viral potential
   - Event-driven content generates maximum engagement
   - Understanding trending patterns can inform content strategy

### Next Steps

Future analyses could explore:
- Sentiment analysis of trending hashtags
- Geographic distribution of trends
- Predictive modeling for trend forecasting
- Network analysis of related hashtags

---

**Project completed by Monse Rojo**  
GitHub: [@cyberyimi](https://github.com/cyberyimi)  
LinkedIn: [Monse Rojo](https://www.linkedin.com/in/monse-rojo-6b70b3397/)

In [None]:
# Export key statistics for portfolio summary
summary_stats = {
    'Total Hashtags Analyzed': len(df),
    'Date Range': f"{df['peak_date'].min().date()} to {df['peak_date'].max().date()}",
    'Years Covered': df['year'].nunique(),
    'Average Tweets per Hashtag': f"{df['tweets'].mean():,.0f}",
    'Max Tweets': f"{df['tweets'].max():,.0f}",
    'Most Common Category': category_counts.index[0]
}

print("\n" + "="*60)
print("PROJECT SUMMARY STATISTICS")
print("="*60)
for key, value in summary_stats.items():
    print(f"{key}: {value}")
print("="*60)