Temple University Home     CIS Home 

 
 

CIS 8538: Text Mining and Language Processing

Instructor: Alexander Yates

Course Description:

This course will give a broad overview of problems and techniques in natural language processing and text mining, and then move on to cover the latest research in selected topics. The overview part of the course will cover topics in:

  • Document classification and ranking: Building indexes, vector space model, similarity functions, feature selection and dimensionality reduction, classification, regression, and ranking methods.
  • Language modeling: Inducing latent structures, Hidden Markov Models, dependency trees, latent Dirichlet allocation.
  • Sequence labeling: Applications, especially information extraction, supervised techniques for structured prediction, representations and distributional similarity.< /li>

The in-depth part of the course will focus on the latest research in topics like domain adaptation, unsupervised and self-supervised information extraction, and knowledge acquisition.

Prerequisites:

Familiarity and basic level of comfort with probability and statistics is essential and will be assumed. Any of the following courses, or specific permission of the instructor, should be enough: CIS 8525, 8526, 8527, 9603, 9664.

Textbook:

Introduction to Information Retrieval, by Manning, Raghavan, and Schuetze. Available free online. We will also read extensively from the research literature, which will be handed out in class, or links to the material will be provided online.




Announcements:

  • 1/20: Welcome to CIS 8538!