Wednesday, June 10, 2015

rating scales

In color science we sometimes have the need to elicit consensus information about an attribute. This is done with a psychometric scale. Usually we have a number of related questions. The term scale refers to the set of all questions, while the line typically used to elicit the response to a question is called an item. The tick marks on the line for an item are the categories.

When I get manuscripts to review, the endpoint categories are often adjective pairs like dark – bright or cold – warm. Such a scale is called a semantic differential. Essentially people put the term they are evaluating on the right side and on the left side they put an antonym. The common problem is that the antonym – synonym pair does not translate well from one language to another because they are culture dependent. Manuscripts in English reporting on work carried out in a completely different language are often difficult to assess.

The safe approach is to use a Likert scale, where the 'i' in Likert is short and not a diphthong, as it is typically mispronounced by Americans. In the Likert scale the extreme points for all items in the scale are strongly disagree – strongly agree. The question is now how many points the scale should have. When you need a neutral option the number is odd, otherwise it is even.

For the actual number I often see quoted 5 and 7, maybe in reference to George Miller's 7±2 paper (G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2):81–97, 1956). However, as such the answer is incorrect, and it is incorrect to use intervals of the same length between the categories.

The correct way is to do a two-step experiment. In the first step the observers are experts in the subject matter and the scale is a set of blank lines without tick marks or labels. These experts are asked to put a mark on the line to indicate how strongly they agree. You need about 1500 observations: if you have a scale with 10 items, you need about 150 experts. The number depends on the required statistical significance.

On their answers you perform cluster analysis to find the categories. This will give you the number of tick marks and their location. This allows you to produce a questionnaire you can use in a shopping mall or in the cafeteria to obtain the responses from a large number of observers. For more information on the statistics behind this, a good paper is J. H. Munshi. A method for constructing Likert scales. Available at SSRN, April 2014.

After you have evaluated your experiment and produced the table with the results, you need to visualize them graphically. The last thing you want to do is to draw pie charts: they are meaningless! Use a good visualizer like Tableau. If you use R, use the HH package. A good paper is R. M. Heiberger and N. B. Robbins. Design of diverging stacked bar charts for Likert scales and other applications. J. Stat. Softw., 57:1–32, 2014.