Back to imamuseum.org

There’s more to color than meets the eye

It has come to my attention that the tweets are out of the bag about the new interactive admissions map on the IMA dashboard. The map is a mashup of our admissions data using the Google Maps API and a zip code demographics resource called ZIPskinny. I thought that I would take some time today to discuss the art and science of colormap selection that went into developing this visualization.

Admissions Map

First, a quick overview of some technical bits. The user can select a range of dates from the map interface, and a new set of markers is rendered after an AJAX query to the backend PHP code, which requests the data from our database and aggregates it at the zip code and state level. The decision to do this two level aggregation was based on the performance hit that would be taken in trying to render on the order of 7000 icons for a year’s worth of data on a nationwide map. The MarkerManager class (which used to be part of the core Google Maps API) is used to display one set of icons when zoomed inside of what might be called “state level” and another set when zoomed out further. This adds a bit of extra complexity to our colormap choice.

There are a number of ways that this data could have been mapped into visual symbols. The size or shape of the glyphs could be related to the number of admissions, for example. The method that we are using maps the number of admissions to color (when this mapping is stored, it is called a colormap). The particular mapping that we use is important, because some mappings are better than others for certain tasks. For the markers on the map, we use a colormap that smoothly transitions from a blue at the low end to a red at the high end. The particular choice of blue to red leverages our cultural understanding of blue as “cold” and red as “hot” (note that sometimes colormap choice depends on cultural interpretation), a metaphor which works well as a representation of low vs. high admission rates. The combination of these two choices allows us to understand the general trends in the data without needing to refer back to the legend frequently. The drawback is that it is a bit more difficult to compare the value of two individual markers than it would be if we had picked a colormap of more distinct hues such as {red, orange, yellow, green, blue, purple}. We tried to make it easier for the viewer to make comparisons by limiting the number of colors in our legend.

The next important choice is how exactly to map the number of admissions to a particular color. To do this well, an understanding of the distribution of the data is necessary. Intuitively, we would expect that there will be high admissions numbers for the zip codes closest to the museum, and the trend will be toward increasingly lower admissions from more distant zip codes, and likely there will be many zip codes with very low, but non-zero admissions numbers. The best way to look at this objectively is with a histogram. I will take a number of approaches with the data from the year 2008, for which the highest raw admission count for a zip code was 12,375 (during the aggregation, 9 digit zip codes are merged into 5 digit zip codes, resulting in slightly higher numbers in some instances), out of 6806 zip codes.

Linear histogram

This first histogram segments the zip codes into buckets in a linear fashion, with each bucket representing 10% of 12,375. As you can see, the vast majority of the zip codes end up in the first bucket. If we used 10 colors with this linear mapping, almost all of the glyphs would be blue. It would be great for identifying and comparing the few zip codes that have very high admission, but a lot of subtlety in the big picture would be lost.

Linear with upper limit

This histogram also uses a linear segmentation, but with an upper limit (or clamp). Any zip code with 10 admissions or more are put into the red bucket. This would be fine for comparing admissions from distant zip codes, but we wouldn’t be able to distinguish visually between admissions from zip codes around Indianapolis.

Logarithmic

This is where the beauty of logarithms comes to the rescue. Let’s use our example to walk through the process. First we determine the logarithm of our maximum number, 12375, in base 2 (working in base 2 is very efficient for computers). The result is 13.59. The highest bucket represents 12375 in the real world, and 13.59 in exponent-land. Each bucket represents a tenth of 13.57 in exponent-land, just as in our first histogram each bucket represented a tenth of 12375. We then simply derive the numbers that each bucket represents in the real world; 2 to the 1.359th power equals 3 (when represented as a whole number), for example. These whole numbers in the real world are then used to put the zip codes in the proper bins.

The result is that we have teased out more meaning to bestow upon our colors. There are still thousands of zip codes in those blue buckets, but they are the ones that are scattered across the nation on our map, so here we are aided by spatial distribution. In the red zone, there are dozens of zip codes in each bucket, and this is just what we want to be able to analyze the subtleties around Indianapolis where the icons are more spatially dense. It also turns out that this trend is pretty much independent of the date range, and the colormap still works out well at the state level.

On the actual map we use eight icons rather than ten to reduce clutter in the legend, but the algorithm is the same.

ZIPskinny demographics

You might notice that we use a different type of colormap in the demographic windows. We actually tried using a smooth colormap for this data for the more linear demographics (Income, Age, etc.). The problem was that it was then very difficult to visually relate a particular shade of blue or green to a particular age or income level, which would almost certainly lead to eye strain and headaches. So here we used a more rainbow-like colormap to make that analysis easier.

In summary, colormap selection is a complex process involving aspects of mathematics, design, and the nature of the dataset. It’s one of those areas where art and science come together and the best practices of each can lead to successful communication of concepts and beautiful representations of data.

Filed under: Design, Technology

3 Responses to “There’s more to color than meets the eye”

  • avatar
    Emily Says:

    Ed,
    I am loving that the IMA keeps coming up with new ways to be more transparent. Max raved about this when he spoke to my class last week. It is so well done!

  • avatar
    Ed Says:

    Thanks, Emily! It’s really rewarding to hear from people who are excited about the work that we do.

  • avatar
    Meg Says:

    Ed-I’m completely in awe of your brain. Awesome!

  • Trackbacks