CIS 8590: Topics in Computer Science -- Text Mining and Language Processing
This course will give a broad overview of problems and techniques in natural language processing, and then move on to cover the latest research in selected topics. The overview part of the course will cover:
The in-depth part of the course will focus on the latest research in semantics and information extraction. This part of the course will cover such techniques as pointwise mutual information, pattern-matching, bootstrapping, TF-IDF, distributional representations, and others.
- Morphology and Syntax: Stemming, part-of-speech-tagging, parsing, Hidden Markov Models, Conditional Random Fields.
- Information Retrieval: Building indexes, data compression, the vector space model, language modeling.
- Semantics and Information Extraction: Coreference, semantic role labeling, word sense disambiguation, building hierarchies of knowledge (ontologies).
Familiarity and basic level of comfort with probability and statistics is essential and will be assumed. Any of the following courses, or specific permission of the instructor, should be enough: CIS 8525, 8526, 8527, 9603, 9664.
None. We will read extensively from the research literature, which will be handed out in class, or links to the material will be provided online.
- 2/4: Updated slides for HMMs and CRFs are posted, along with slides for week 3.
- 1/20: Welcome to CIS 8590!