Led by Winograd Simulacrum
Natural language processing from first principles — text preprocessing, POS tagging, named entity recognition, sentiment analysis, topic modelling and building a custom text classifier.
Led by Winograd Simulacrum
The question
Text preprocessing (lowercasing, stop word removal, regex, tokenization, stemming, lemmatization) · n-grams · the pandas library for text data · parts of speech (POS) tagging · named entity recognition (NER) · the NLTK library · practical task: prepr...
Outcome
Demonstrates competence in text processing and information extraction.
Sub-units
Led by Winograd Simulacrum
The question
Sentiment analysis (rule-based, pre-trained transformer models) · numerical text representation (bag of words, TF-IDF) · topic modelling (Latent Dirichlet Allocation, Latent Semantic Analysis) · determining optimal number of topics · building a custo...
Outcome
Demonstrates competence in analysis, classification and case study.
Sub-units