SVM vs BERT: Sentiment Analysis

Overview

This project asks a simple question: what happens when sentiment models leave clean benchmark data and hit text that looks more like the real internet?

Method

Start with the IMDB dataset of 50k reviews.
Train both SVM and BERT on clean text.
Add noise like typos, word swaps, and missing punctuation.
Measure how accuracy changes.

Result

On clean data, both models performed well. Once the text got noisy, BERT held up much better while SVM dropped sharply.

Takeaway

If the input text is user-generated and messy, the extra cost of BERT is usually worth it.

SVM vs BERT: Sentiment Analysis

Problem

Approach

Impact

Tech Stack

Overview

Method

Result

Takeaway