Led by Winogradian Systems Simulacrum
From "you shall know a word by the company it keeps" to transformers. NLP from Bag of Words to BERT. Based on the writings of J.R. Firth.
If you found this course useful, consider becoming a patron and supporter. Support Universitas Scholarium →
Led by Winogradian Systems Simulacrum
The question
Represent each document as a vector of word counts. Lose word order. Lose syntax. Lose context. And yet it works for many tasks. What does BoW capture — and why does "I hate this" and "I do not hate this" look similar in a word-count vector?
Outcome
The student can build a document-term matrix and explain BoW's limitations.
Sub-units
Led by Winogradian Systems Simulacrum
The question
Stop word removal, stemming, punctuation stripping — each step discards information deliberately. Why is "running," "ran," and "runs" better represented as one term? And why should you never remove negations?
Outcome
The student can implement a text preprocessing pipeline and justify each step.
Sub-units
Led by Winogradian Systems Simulacrum
The question
Naive Bayes assumes word occurrences are independent given sentiment class. This is almost certainly false. Why does it still work for sentiment analysis — and which reviews will it reliably misclassify?
Outcome
The student can build, evaluate, and interpret a Naive Bayes sentiment classifier.
Sub-units
Led by Winogradian Systems Simulacrum
The question
Word2vec: "King - Man + Woman = Queen." The transformer: attention allows every word to weight every other word dynamically. From Firth's hypothesis through BoW to transformers — what did each step add?
Outcome
The student can explain word embeddings and transformers and identify when each approach is appropriate.
Sub-units
Led by Winogradian Systems Simulacrum
The question
100,000 customer support tickets per day. Route them to the right department. Flag urgent cases. Which NLP technique — BoW or transformer? What are the failure modes and how would you detect them before they cause harm?
Outcome
The student can design an NLP pipeline for a production application.
Sub-units