Universitas Scholarium — A Community of Scholars Log In
Tutorial Course

COMP 2206 · Machine Learning: Natural Language Processing

Led by Winogradian Systems Simulacrum

5 modules 5 modules Computing Updated 6 days ago

From "you shall know a word by the company it keeps" to transformers. NLP from Bag of Words to BERT. Based on the writings of J.R. Firth.

If you found this course useful, consider becoming a patron and supporter. Support Universitas Scholarium →

The Bag of Words Mod…1Text Preprocessing2Sentiment Analysis w…3Beyond Bag of Words:…4NLP in Practice5
  1. Module 1

    The Bag of Words Model

    Led by Winogradian Systems Simulacrum

    The question

    Represent each document as a vector of word counts. Lose word order. Lose syntax. Lose context. And yet it works for many tasks. What does BoW capture — and why does "I hate this" and "I do not hate this" look similar in a word-count vector?

    Outcome

    The student can build a document-term matrix and explain BoW's limitations.

    Sub-units

    1. 1.1 Build a Document-Term Matrix
    2. 1.2 Why BoW Works Despite Losing Word Order
  2. Module 2

    Text Preprocessing

    Led by Winogradian Systems Simulacrum

    The question

    Stop word removal, stemming, punctuation stripping — each step discards information deliberately. Why is "running," "ran," and "runs" better represented as one term? And why should you never remove negations?

    Outcome

    The student can implement a text preprocessing pipeline and justify each step.

    Sub-units

    1. 2.1 Clean and Vectorise
  3. Module 3

    Sentiment Analysis with Naive Bayes

    Led by Winogradian Systems Simulacrum

    The question

    Naive Bayes assumes word occurrences are independent given sentiment class. This is almost certainly false. Why does it still work for sentiment analysis — and which reviews will it reliably misclassify?

    Outcome

    The student can build, evaluate, and interpret a Naive Bayes sentiment classifier.

    Sub-units

    1. 3.1 Build the Sentiment Classifier
  4. Module 4

    Beyond Bag of Words: The Deep NLP Revolution

    Led by Winogradian Systems Simulacrum

    The question

    Word2vec: "King - Man + Woman = Queen." The transformer: attention allows every word to weight every other word dynamically. From Firth's hypothesis through BoW to transformers — what did each step add?

    Outcome

    The student can explain word embeddings and transformers and identify when each approach is appropriate.

    Sub-units

    1. 4.1 Essay: BoW to Transformer
  5. Module 5

    NLP in Practice

    Led by Winogradian Systems Simulacrum

    The question

    100,000 customer support tickets per day. Route them to the right department. Flag urgent cases. Which NLP technique — BoW or transformer? What are the failure modes and how would you detect them before they cause harm?

    Outcome

    The student can design an NLP pipeline for a production application.

    Sub-units

    1. 5.1 Final Essay: NLP Application Analysis