CIS 8590: Topics in Computer Science -- Text Mining and Language Processing
[Spring 2008]
Prerequisites,
Text,
Description,
Grading,
Exams,
Final Project.
Additional information about this course may be found on the Web at
http://knight.cis.temple.edu/~yates/cis8590/.
Lecture Time: Thursdays: 4:40pm to 7:10pm in Tuttleman 1B
Instructor : Alexander Yates
- Office : Wachman Hall, Room 303A
- E-Mail :
- Contact Hours:
Thursdays from 2:00pm to 4:00pm,
or by appointment, or drop by and see if I'm in.
Miscellaneous:
- Each student should have a general Temple email address, usually of the form firstName.LastName@temple.edu
- Important student information is accessible from http://owlnet.temple.edu/
- The last day to drop from the course (and get tuition refund) is Monday, September 15, 2008.
The last day to withdraw from the course (no refund) is Monday,
November 3, 2008. Students who have previously withdrawn from this course, or who
have already withdrawn from 5 courses since September 2003 may not withdraw.
- Any student who has a need for accomodation based on the impact of a
disability should contact me privately to discuss the specific situation
as soon as possible. Students with documented disabilities should contact
Disability Resources and Services at
215-204-1280 in 100 Ritter Hall to coordinate reasonable accomodations.
- Freedom to teach and freedom to learn are inseparable facets
of academic freedom. The University has adopted a policy on
Student and Faculty Academic Rights and Responsibilities
(Policy # 03.70.02) which can be accessed through the
following link:
http://policies.temple.edu/getdoc.asp?policy_no=03.70.02
- Students should be familiar with the University statement on academic
honesty found at the following link
http://www.temple.edu/bulletin/Responsibilities_rights/responsibilities/responsibilities.shtm
- The grievance procedures are available online
PREREQUISITES
-
A general familiarity and basic level of comfort with probability and statistics is essential, and will be assumed. Any of the following courses, or specific permission of the instructor, should be enough: CIS 8525, 8526, 8527, 9603, 9664.
TEXT
There is no textbook for this course. We will be reading extensively from the research literature.
DESCRIPTION
This course will give a broad overview of problems and techniques in natural language processing, and then move on to cover the latest research in selected topics. The overview part of the course will cover problems in:
- Information Retrieval: Building indexes, data compression, representation of queries and documents, and similarity functions.
- Information Extraction: Building hierarchies of knowledge (ontologies), determining the meaning of words, and determining the relationships that exist between entities referred to in text.
The in-depth part of the course will focus on the latest research in unsupervised information extraction. This part of the course will cover such techniques as stemming, pointwise mutual information, pattern-matching, bootstrapping, TF-IDF, n-gram models, Hidden Markov Models, Conditional Random Fields, statistical parsing, clustering, and language modeling.
GRADING
- Quizzes: 30%
- In-class Participation: 20%
- Final Project: 50%
EXAMS AND QUIZZES
All exams and quizzes are closed book. Their content is cumulative, i.e. they address
the material from the entire semester up to the day of the exam. If a student misses
the midterm for an emergency [as agreed with instructor], there will be no
makeup exam: the homeworks, quizzes, and final project will become
proportionally more important. If a student misses the midterm without previous
agreement and without definitive proof as to the medical or legal reasons,
he or she will get a zero for that exam. Quizzes that are
missed will not be made up.
//The final exam is mandatory on the scheduled day.
FINAL PROJECT
Several project ideas will be suggested during the course of the semester, but students are free to suggest their own, especially if they relate to their current research. Students will be expected to come up with innovative, novel solutions to problems in text mining and language processing.
Course projects will be undertaken individually or in small teams (2-3 students). Each student on a team will receive the same grade for the project; it is up to the team members to divide the work fairly.
More information on course projects will be provided soon.