Massive and high-dimensional numerical (or continuous) data may visualized using parallel coordinates. For a technical discussion of parallel coordinates see (Wegman 1990). In parallel coordinates, axes are drawn parallel. A vector (or row) of data, (x1, x2, …, xn), is plotted by drawing x1 on axis 1, x2 on axis 2, and so on through xn on axis n. The plotted points are joined by a broken line. The use of parallel coordinates to visualize massive and high-dimensional data is often a first step in exploratory data analysis (EDA) where one may wish to visually identify patterns, clusters, or outliers. Towards the purpose of EDA, a generalized rotation of the coordinate axes in high-dimensional space, referred to as the grand tour (Wegman 2002), may be used in combination with hue and saturation brushing techniques (Wegman 1996).
In this open source release called, VizApp, parallel coordinates, grand tour, and color brushing options are available. VizApp is implemented using Java 1.4 and requires Java 1.4 to run (may not work correctly on older or earlier Java versions). The source code of VizApp may be downloaded here.
There are other available open source software related to parallel coordinates: parvis, picviz, and ggobi. To my knowledge, none of these open source software provides the grand tour feature. In VizApp, there is also the ability to export the grand tour as JPGs or a video file (MOV format only). Here are some significant features of VizApp:
- parallel coordinate display
- grand tour
- export grand tour as JPGs
- export grand tour as video file (MOV format)
- color brushing
- switching axes interactively
- supports multiple file types
Using VizApp is relatively simple. First, download the binary distribution here. Assuming you have Java 1.4 (the JRE or the JDK installed), then click on run.bat. When the application is running, you can drop files into the text field or main drawing panel. The file types supported are those for Crystal Vision, parviz, and Xmdv (OKC type only). By following the links for parvis and Xmdv, you may study the file formats. For Crystal Vision, the file format is as follows:
- variables: n
- x1 x2 … xn
An example input Crystal Vision input file looks like the following:
variable: 3 labels: x1 x2 x3 1.2 3.2 3.2 3.4 6.6 9.9 ... 3.9 9.8 1.1
In fact, the other file formats are similar to this format. The following screen shots shows VisApp running.
Once you have the file loaded up, the data will be plotted. From here, you can choose to view the grand tour, color the set of broken lines, move the axes, or export the grand tour. The following screen shot shows a file loaded up in VizApp.
To view the grand tour, hit “Play.” To stop the grand tour, hit “Stop.” To reset the display, hit “Reset.”
To color the broken lines, select the “Color Lines” radio box (at the bottom). Select a color by clicking on the “Color” button. Left click and moving the mouse will then draw a rectangle around the lines that you want to color. To cancel coloring while dragging, right click the mouse.
To switch the axes, select the “Move Axis” radio box (at the bottom). Move your mouse to the axis that you want to move and left click. While dragging move this axis to the one you want to swap places with (the x3 and x5 axes are swapped).
To export the grand tour, go to Tools -> Export (or press, Alt-E). Here you can choose the name of the file you want to export to (the file should end in MOV extension). You can also choose if you want the application to clean up the generated JPGs. When you are ready, click “Start.” When you are done, click “Cancel.”
The performance of VizApp depends on your computer’s processing power and memory available. The script to run VizApp allocates up to 512M for the maximum heap size. As expected, larger datasets will run slower than smaller ones (for rendering, grand tour, and image/movie export). Be careful in using the antialias option. Using the antialias option even for a comparatively small dataset may slow things down. It is recommended to use antialias only for good looking screen shots. The following screen shots shows a picture with antialias turned in on VizApp.
Some planned improvements (way off into the future) to VizApp includes:
- remove axes interactively
- remove data vectors interactively
- ability for saturation brushing
- ability to visualize scatter plot with grand tour
- ability to visualize 3D plot with grand tour
- ability to inspect vectors interactively
- ability to undo user gestures (i.e. coloring)
Parallel coordinates in combination with the grand tour and hue/saturation brushing is a useful data mining tool for visualizing massive and high-dimensional datasets. There are many tools and software dealing with parallel coordinates. VizApp adds to the collection of open source software available.
I hope you have fun with VizApp. Happy data mining! Sib ntsib dua!
E. J. Wegman and J. L. Solka. “On some mathematics for visualizing high dimensional data,” Sanhkya, vol. 64, no. 2, 2002, pp. 429–452.
E.J. Wegman and Q. Luo. “High Dimensional Clustering Using Parallel Coordinates and the Grand Tour,” Computing Science and Statistics, vol. 28, 1996, pp. 361–368.
E. J. Wegman. “Hyperdimensional Data Analysis Using Parallel Coordinates,” Journal of the American Statistical Association, vol. 85, 1990, pp. 664–675.