CIS 8590: Topics in Computer Science -- Text Mining and Language Processing
Additional information about this course may be found on the Web at
Lecture Time: Thursdays: 4:40pm to 7:10pm in Tuttleman 1B
Instructor : Alexander Yates
- Office : Wachman Hall, Room 303A
- E-Mail :
- Contact Hours:
Thursdays from 2:00pm to 4:00pm,
or by appointment, or drop by and see if I'm in.
- Each student should have a general Temple email address, usually of the form firstName.LastName@temple.edu
- Important student information is accessible from http://owlnet.temple.edu/
- The last day to drop from the course (and get tuition refund) is Monday, February 1, 2010.
The last day to withdraw from the course (no refund) is Monday,
March 29. Students who have previously withdrawn from this course, or who
have already withdrawn from 5 courses since January 2005 may not withdraw.
- Any student who has a need for accommodation based on the impact of a
disability should contact me privately to discuss the specific situation
as soon as possible. Students with documented disabilities should contact
Disability Resources and Services at
215-204-1280 in 100 Ritter Hall to coordinate reasonable accomodations.
- Freedom to teach and freedom to learn are inseparable facets
of academic freedom. The University has adopted a policy on
Student and Faculty Academic Rights and Responsibilities
(Policy # 03.70.02) which can be accessed through the
- Students should be familiar with the University statement on academic
honesty found at the following link
- The grievance procedures are available online
A general familiarity and basic level of comfort with probability and statistics is essential, and will be assumed. Any of the following courses, or specific permission of the instructor, should be enough: CIS 8525, 8526, 8527, 9603, 9664.
There is no textbook for this course. We will be reading extensively from the research literature.
This course will give a broad overview of problems and techniques in natural language processing, and then move on to cover the latest research in selected topics. The overview part of the course will cover problems in:
- Morphology and Syntax: Stemming, part-of-speech-tagging, parsing, Hidden Markov Models, Conditional Random Fields.
- Information Retrieval: Building indexes, data compression, the vector space model, language modeling.
- Semantics and Information Extraction: Coreference, semantic role labeling, word sense disambiguation, building hierarchies of knowledge (ontologies).
The in-depth part of the course will focus on the latest research in semantics and information extraction. This part of the course will cover such techniques as pointwise mutual information, pattern-matching, bootstrapping, TF-IDF, distributional representations, and others.
- Midterm Exam: 50%
- Final Project: 30%
- In-class Participation: 20%
EXAMS AND QUIZZES
All exams and quizzes are closed book. Their content is cumulative, i.e. they address
the material from the entire semester up to the day of the exam.
If a student misses the midterm without previous
agreement and without definitive proof as to the medical or legal reasons,
he or she will get a zero for that exam. Quizzes that are
missed will not be made up.
Several project ideas will be suggested during the course of the semester, but students are free to suggest their own, especially if they relate to their current research. Students will be expected to come up with innovative, novel solutions to problems in text mining and language processing.
Course projects will be undertaken individually or in small teams (2-3 students).
More information on course projects will be provided soon.