III-COR: Collaborative Research: Mining Biomedical and Network Data Using Tensors
Supported by the National Science Foundation (NSF) -
IIS-0705359,
IIS-0705215
Investigators
Christos Faloutsos, PhD
Department of Computer Science
Carnegie Mellon University
Wean Hall
5000 Forbes Ave.
Pittsburgh, PA 15213-3891
412-268-1457, 412-268-5576 (fax)
christos AT cs DOT cmu DOT edu
http://www.cs.cmu.edu/~christos
Vasileios Megalooikonomou, PhD
Department of Computer and Information Sciences
Temple University
314 Wachman Hall
1805 N. Broad Street
Philadelphia PA 19122
215-204-5774, 215-204-5082 (fax)
vasilis AT cis DOT temple DOT edu
http://www.cis.temple.edu/~vasilis
List of Personnel
-
Christos Faloutsos, PhD,
Professor of Computer Science, CMU.
-
Vasileios Megalooikonomou, PhD,
Associate Professor of Computer Science, Temple Univ.
- Scott H. Faro, MD, Professor and Vice Chair, Radiology, Temple University.
- Feroze Mohamed, PhD, Associate Professor, Radiology, Temple University.
- Charalambos Tsourakakis,
Ph.D. student, CMU.
- Michael Barnathan,
Ph.D. student, Temple Univ.
Keywords
Data mining
fMRI images
network traffic
tensors
Project Summary
Given a large collection of fMRI images over time, how can one find patterns and correlations? Similarly, given a never-ending stream of network traffic information, how can one monitor for anomalies, intrusions, and potential failures? The main idea behind this proposal is to treat both problems using the theory of tensors. Despite the seemingly wide differences in the two settings, they both boil down to finding patterns in multidimensional arrays, sparse or dense. Tensors are exactly generalizations of matrices, and correspond roughly to ``DataCubes'' of data mining. Matrix analysis and decompositions are part of the standard toolbox for data mining, with SVD/PCA/LSI being the typical methods for dimensionality reduction, pattern discovery and ``hidden variable'' discovery. Extending these tools to higher dimensionalities is valuable and tensors provide the tools to do this generalization. However, these tools have not yet been put to use in large volume data mining. This is the main contribution of this proposal. The investigators propose (a) to design tensor decomposition algorithms that scale for large datasets, with special attention to sparse datasets, and to never-ending streams of data and (b) to apply them on two driving applications, fMRI data analysis and network data analysis.
The investigators propose to analyze large volumes of fMRI data performing the following sub-tasks: cluster voxels with similar behavior over time for a given subject and/or task or across subjects and/or tasks, classify patterns of brain activity, and detect lag correlations and spatio-temporal patterns among fMRI time sequences. The investigators also propose to perform the following inter-related tasks on multiple GigaBytes of network flow data: anomaly detection, pattern discovery, and compression.
Both of these applications are important for medicine, health management, and for computer and national security. Analysis of fMRI data can help understanding how the brain functions, which parts of the brain collaborate with what other parts, and whether there are variations across subjects and across task-related activities. For the network traffic monitoring setting, fast detection of anomalies is important, to spot malware, port-scanning attempts, and just plain non-malicious failures.
The educational goals include incorporating the research findings in advanced graduate courses at CMU (15-826) and at Temple (9664, 9665) and proposing tutorials in leading conferences in databases, data mining and bio-informatics audiences.
Publications and Products
Q. Wang, V. Megalooikonomou, C. Faloutsos,
"Time Series Analysis with Multiple Resolutions",
Information Systems, (to appear).
E. Miranda, G. Shan, and V. Megalooikonomou,
"Performing Vector Quantization Using Reduced Data Representation",
Proceedings of the Data Compression Conference (DCC),
Salt Lake City, Utah, 2009.
Q. Wang, V. Megalooikonomou,
"A Performance Evaluation Framework for Association Mining in Spatial Data",
Journal of Intelligent Information Systems, (under review).
V. Megalooikonomou, M. Barnathan, D. Kontos, P. R. Bakic, A. D.A. Maidment,
"A Representation and Classification Scheme for Tree-like Structures in
Medical Images: Analyzing the Branching Pattern of Ductal Trees in X-ray
Galactograms",
IEEE Transactions on Medical Imaging (in press).
M. Barnathan, R. Li, V. Megalooikonomou, F. Mohamed, S. Faro,
"
Wavelet Analysis of 4D Motor Task fMRI Data",
Proceedings of the 22nd International Congress and Exhibition on Computer
Assisted Radiology and Surgery, Barcelona, Spain, 2008.
J. Sun, T. Eliassi-Rad, C. E. Tsourakakis, E. Hoke,
C. Faloutsos, "Two heads better than one: Pattern Discovery in
Time-evolving Multi-Aspect Data" , PKDD, 2008.
A. P. Appel, D. H. Chau, C. E. Tsourakakis, C. Faloutsos, J. Leskovec,
"Shatterviz: Principled Graph Summarization and Visualization" ,
(under review).
C. Tsourakakis, "Fast Counting of triangles in real-world
networks: proofs, algorithms and observations" ,
CMU Technical Report, 2008.
Other specific products
Coming soon...
Area References
L.R. Tucker."Some mathematical notes on three-mode factor
analysis," Psychometrika,
31(3):279--311, 1966.
R.A. Harshman."
Foundations of the parafac procedure: model and conditions for an
explanatory multi-mode factor analysis,"
UCLA working papers in phonetics,
16:1--84, 1970.
Petros Drineas and Michael W. Mahoney."
A randomized algorithm for a tensor-based generalization of the svd,"
Technical Report, 2005.
Jimeng Sun, Dacheng Tao, and Christos Faloutsos."
Beyond streams and graphs: Dynamic tensor analysis,"
KDD,2006.
V. Megalooikonomou, J. Ford, L. Shen, F. Makedon, and Andrew Saykin.
"Data mining in brain imaging.
Some mathematical notes on three-mode factor,"
Statistical Methods in Medical Research,
9(4):359--394, 2000.
Project Websites
http://knight.cis.temple.edu/~vasilis/research/tensors.html
This is the main website for our project.