Back to Projects
SVM vs BERT: Sentiment Analysis

SVM vs BERT: Sentiment Analysis

A comparison of SVM and BERT on noisy review text to see which one holds up better.

Problem

Real user text is messy. Typos, slang, and missing punctuation can break simpler models fast.

Approach

Trained SVM and BERT on 50k movie reviews, then added controlled noise to measure how much each model degraded.

Impact

BERT stayed strong on noisy text while SVM dropped off much faster, which made the tradeoff clear.

Tech Stack

BERT SVM Pandas NumPy Scikit-learn PyTorch Python

Overview

This project asks a simple question: what happens when sentiment models leave clean benchmark data and hit text that looks more like the real internet?

Method

  1. Start with the IMDB dataset of 50k reviews.
  2. Train both SVM and BERT on clean text.
  3. Add noise like typos, word swaps, and missing punctuation.
  4. Measure how accuracy changes.

Result

On clean data, both models performed well. Once the text got noisy, BERT held up much better while SVM dropped sharply.

Takeaway

If the input text is user-generated and messy, the extra cost of BERT is usually worth it.