Richard Beigel
Professor of Computer and Information Sciences

address | phone | photos | class | textbook | service | papers

bioinformatics | visual data mining | molecular computing | fault diagnosis | complexity theory | circuits | algorithms | bounded queries | mathematical theory of computation

Richard Beigel studied mathematics and computer science at Stanford University, where he received his BS, MS, and PhD. He has been a teacher and researcher in computer science since 1986 at the Johns Hopkins University, Yale University, Lehigh University, the University of Illinois at Chicago, Rutgers DIMACS, NEC Research Inc., and Temple University, where he is now a full professor in the department of Computer and Information Science. His research interests include computational genomics, visual data mining, molecular computing, fault diagnosis, complexity theory, circuits, algorithms, bounded queries, and the mathematical theory of computation. He has been the advisor on three Ph.D. theses and nine senior projects. He and Robert Floyd have co-authored an innovative undergraduate textbook on automata theory. In 1989 he received a Presidential Young Investigator Award from the National Science Foundation to further his research and teaching. He has served on the Executive Comittee for the IEEE Conference on Computational Complexity, and has also served on the program committees for several conferences including the ACM Symposium on Theory of Computation.

Professional Activities


Papers (organized by research area)

Bioinformatics

Tettelin et. al. proposed a new method for closing the gaps in whole genome shotgun sequencing projects. The method uses a multiplex PCR strategy in order to minimize the time and effort required to sequence the DNA in the missing gaps. This procedure has been used in a number of microbial sequencing projects including Streptococcus pneumoniae and other bacteria. We describe a theoretical framework for this problem and present nearly optimal algorithms for its solution.

We have also developed a computational framework for the synthesis of oligonucleotide microarrays.

Visual Data Mining

Let Ben Shneiderman tell you about visual data mining.

Dynamic query interfaces (DQIs) are a novel database interface developed at the University of Maryland's Human-Computer Interaction Lab (HCIL). They provides continuous realtime feedback to the user during the query formulation process. To see DQIs in action, visit out one of HCIL's commercial spinoffs: IVEE. Experiments at HCIL show that DQIs are an elegant and powerful interface to small databases. However, previous DQI algorithm were too slow for large databases. Egemen Tanin, Ben Shneiderman, and I developed and implemented a new approach to DQIs that works well with large databases.

Query previews are a user-interface paradigm that facilitates browsing a huge collection. They were developed by HCIL for the NASA EOSDIS project. My contribution to the research applies geometric techniques to make query previews algorithmically feasible.

Molecular Computing

With its exponential parallelism, molecular computing offers the hope of solving important problems that have resisted solution on conventional computers. Various molecular operations have been proposed as feasible; because it is unclear which operations will be usefully implementable, algorithms that use the simplest operations are to be preferred. In particular, communication between processes may be impossible. Parallelism and running time are resources to be exploited efficiently.

It is known that molecular computers with limited operation sets, polynomial time, and exponential parallelism can solve any problem in the class NP. By further limiting the parallelism, various bounded-nondeterminism subclasses of NP are obtained. Bin Fu and I are investigating the relationships among molecular computation classes and these NP subclasses.

While many molecular algorithms for important problems use 2n parallelism, this severely limits the size of problems that those algorithms can solve. In the last two decades we have seen an improvement from 2n to 2cn for various c < 1 in the running time of classical serial algorithms for some NP-complete problems, such as 3-SAT, graph coloring, and independent set. Using bounded nondeterminism as an algorithm-design technique, we are developing more efficient molecular algorithms for those and other important NP-complete problems.

This research is supported by the National Science Foundation under grant CCR-9700417, ``Connections between Space-Bounded Molecular Computation and Classical Complexity Theory.''

Fault Diagnosis

Imagine a robot surveying a nuclear accident. As it collects data in radioactive hot spots, its processors tend to get fried. Before too many processors fail, the robot retreats to a safe spot in order to heal. During the healing process, the robot's processors test each other and communicate test results (some correct, some incorrect) to an external controller in a very safe location as far as possible from the accident. The controller determines which processors are really faulty and sends instructions to deactivate them. Then the robot may return to its mission.

The robot's ``healing'' process is called system level fault diagnosis, as proposed initially by Preparata, Metze, and Chien. When a good processor tests another processor it reports the testee's status correctly to the controller. But when a faulty processor tests another processor it may report the testee's status correctly or incorrectly. Such testing protocols have been usefully implemented by Bianchini et al.

More formally, we assume that each processor is either ``good'' or ``faulty,'' and that processors can test each other. (A good tester will tell whether the testee is good or faulty. A faulty tester may lie.) Tests may be performed in parallel, provided that each processor participates in at most one test per round. My student Will Hurwood's dissertation answers major questions on parallel system-level fault diagnosis algorithms in static, dynamic, and distributed models.

In static fault diagnosis, we assume that a strict majority of the processors are good and that no processors fail during testing; the objective is to determine the status (good or faulty) of every processor. The best known algorithm takes only 10 rounds, independent of the number of processors.

In dynamic diagnosis, we assume that at most t processors fail per round and the controller can order at most t repairs per round; the objective is to maintain a dynamic equilibrium in which the number of faulty processors is bounded. The best algorithm limits the number of faulty processors to O(tlogt) at all times.

In distributed diagnosis, there is no controller. Instead each good processor must determine the status of all other processors. This problem is equivalent to the collect problem in distributed computing. The best known algorithm for this problem is randomized and completes diagnosis after O(nlog3n) tests with high probability, independent of which processors actually fail.

This research is supported by the National Science Foundation under grant CCR-9796317, ``Parallel Fault Diagnosis.''

To view my power point slides on parallel and distributed fault diagnosis, select compressed slides (68K) or uncompressed slides (419K).

Complexity Theory

The goal of complexity theory is to classify computational problems according to their computational complexity, that is, the minimum amount of resources needed to solve the problem. The classical computing devices are the Boolean circuit, the Turing machine, and the random-access machine. The classical computing mode is deterministic, and the classical resources are circuit size and depth, and time and space. Because of the present difficulty in proving lower bounds in those models, huge classes of important problems have resisted classification.

In order to classify more problems, new computing devices, modes of computation, and resources have been propose. The new devices include threshold circuits, neural nets, DNA computers, and quantum computers; the modes include nondeterministic, alternating, probabilistic, counting, and relativized computation; and the resources include number of DNA strands, nondeterministic bits, alternations, random bits, and oracle queries.

A complexity class is the set of problems solvable by a particular device in a particular computing mode with a particular amount of certain resources. One paradigm in complexity theory is proving that one complexity class C1 is contained in another complexity class C2. This implies upper bounds for all problems in C1 and lower bounds for all problems in C2.

Circuit Complexity

Low-degree polynomials over various rings are a very powerful tool for proving upper and lower bounds on circuits and counting classes. Although we were not the first to use them in this way, we systematized their use and demonstrated their usefulness in several contexts. Our most notable result uses polynomials and rational functions to show that the unbounded-error probabilistic class PP is closed under union and intersection, answering a question that had been open for 17 years. An extension of those techniques shows that small AND-OR circuits with polylog(n) majority gates are equivalent to small AND-OR circuits with only 1 majority gate.

Bounded Queries

A question to an oracle is called a ``query.'' Oracle queries are called ``parallel'' if all of them must be formulated before any of them is asked, i.e., if no question depends on the answer to another of the questions. Oracle queries are called ``serial'' if you learn the answer to each question before asking the next one, i.e., if the questions are adaptive.

If you can solve a decision problem in polynomial time with 3 parallel queries to SAT, can you solve it in polynomial time with 2 serial queries to SAT? Yes, you can. If you can compute a function in polynomial time with 3 parallel queries to SAT, can you compute it in polynomial time with 2 serial queries to SAT? No, you can't in general compute it with 2 serial queries to any set unless P = NP. Sets like SAT are called p-terse because they don't leak information. Although non-p-terse sets need not belong to P, non-p-terse sets must be recognized by polynomial-size circuits.

Bounded queries in the complexity-theoretic framework consists of comparing the power of polynomial-time computation with k queries of one type versus the power of polynomial-time computation with j queries of another type.

There is much related work on bounded queries in recursion theory.

Counting Classes

Counting classes are defined by taking nondeterministic Turing machines and saying they accept if the number A of accepting computations satisfies a particular predicate. Using the predicate A > 0, we see that NP is a counting class. Using the predicate A > T/2 where T is the total number of computations, we obtain the class PP.&NBSP; Using the predicate A = 0 (mod 2), we obtain the class PARITYP. The low-degree polynomial method suffices to prove many relationships among counting classes.

Miscellaneous Complexity Theory

Mathematical Theory of Computation

Recursion theory deals with what functions are computable and what decision problems are solvable in a finite amount of time.

Bounded Queries

Can you solve 3 instances of the halting problem in a finite amount of time, by asking only two questions to the halting set K? Yes, you can. (Try to figure it out yourself. The second question depends on the answer to the first question.) Sets like K are called verbose because they ``leak'' information. Complete sets for higher levels of the arithmetical hierarchy are, however, terse.

Can you solve 4 instances of the halting problem in a finite amount of time, by asking only two questions to the halting set K? No, in fact you can't solve 4 instances of any nonrecursive problem by asking only two questions to any set, even a much harder one.

We have made a general study of when k+1 queries are more powerful than k in recursion theory.