Racing to the winning line with visual data mining

Ben Shneiderman, University of Maryland
April 29, 1997

Racing car drivers view their world at more than 300 kilometers/hour. As the road swerves, they focus on other cars just centimeters away and choose their moment to accelerate into the lead.

Business decision makers are also on a race course with aggressive competitors close by, jockeying for the lead. Any advantage in seeing the road more clearly or steering more sharply may enable them to grab the lead, as well. This is the promise of information visualization and visual data mining.

For many years business decision makers worked on information presented in tabular forms or bar charts. These tools are effective with small data volumes and for predictable behaviors. However, as data volumes have grown and the pace of business has accelerated, new tools may pay off in finding ways to steer more quickly and accurately.

Information visualization and visual data mining enable users to browse large datasets and find patterns, correlations, clusters, gaps, and outliers that reveal opportunities for action. When a marketing manager finds high rates of home purchases in a previously quiet region, this suggests an opportunity to sell furniture or appliances to the new residents. When a stock broker finds a fast-growing company with low price/earnings ratios in an expanding industry, this suggests a buy recommendation.

Today's information-abundant environments mean that huge data resources are available, but scrolling through lists of thousands of stocks with hundreds of data attributes is not effective. Spreadsheets, database management systems, and statistical packages can help, but only with standard search strategies.

Human perceptual abilities are remarkable at spotting orderly and unusual patterns, but current tools have not made good use of these abilities. Now, progress in information visualization algorithms, display techniques, and user controls has made a new generation of software tools possible. Originating at advanced research centers such as Xerox PARC, Carnegie-Mellon University, and the University of Maryland, ideas such as the starfield display are spawning exciting commercial software tools such Spotfire from IVEE Development (www.ivee.com).

Spotfire enables users to view ten of thousands of data points, with as many as five attributes per data point. But the two- dimensional display is just the start line for the drivers. They have powerful controls that enable them to choose among dozens of attributes and then steer through the range of values within seconds. As the data points come and go, understanding grows and insights emerge.

For example, in a stock market database, growth rates can be plotted against price/earnings ratios, with size coding for income, and color coding for profitability -- brighter colors can show higher profits. Then check boxes allow users to choose industry groups and sliders allow selection of total income, company size, or capitalization.

A marketing example might be to plot customer age against household income, with size coding for previous purchases and color coding to show gender. Check boxes could indicate residential regions and levels of education, while sliders might indicate discretionary income, length of residence, and number of children.

Almost anyone who has data can benefit from a visual presentation with powerful user controls. The patterns they see will often confirm their hard won understandings, but more than likely they will discover anomalies in their data, surprising outliers, and interesting clusters. Furthermore, visual patterns are more likely to be remembered and therefore facilitate comparison across databases and recognition of changes over time.

But bringing benefits to the individual user is just the beginning for visual data mining. The potential of enabling corporate knowledge communities who build on each others work is what may make this technology soar. As individuals discover and create interesting and revealing visualizations, they can save them for reuse by others. For example, a weekly sales visualization could be designed by an analyst for presentation at Monday-morning meetings of decision makers. Then the decision makers could see progress or spot problems and immediately adjust the controls to understand what has changed and why. Centralized databases are already a reality in many organizations, so including a library of visualizations can add tremendous value for low cost.