Intraclass Correlation: Picturebook Econometrics
I presented a paper (published in Political Analysis (link)) in my methods class this week on MRP, a technique for estimating state-level public opinion based on national surveys. One of there terms I needed to define is straight out of Downton Abbey, (which Wife, R.N. and I are ripping through at the moment). My analogy went over well in the class so, I thought, why not blog it?
"Intraclass correlation" is a key concept in how to make MRP work (See Andrew Gelman's blog for a nice overview). Being immersed in the Downton world, it became obvious to me... if we think about a measure for which all the Aristocratic Crawleys are similar, and the staff is all alike (say, in socioeconomic status) then there would be high ICC. However, I was corrected by my professor Marc Meredith (sidenote: he calls MRP, "Mr. P" without irony), who noted that the key to having high ICC is variation between groups (or regions in the case of US States) in addition to between individuals. This led to create the following hypothetical, and accompanying graphs to prove the point.
Say there is a poll that questions if the public approves of the maple syrup industry and the champagne industry. Furthermore assume that public approval for both is roughly even near 50%. Despite these both being delicious items, some states like each product more than others. Champagne is French, so no region is particularly wild about it. The make-believe country looks like this.
"Intraclass correlation" is a key concept in how to make MRP work (See Andrew Gelman's blog for a nice overview). Being immersed in the Downton world, it became obvious to me... if we think about a measure for which all the Aristocratic Crawleys are similar, and the staff is all alike (say, in socioeconomic status) then there would be high ICC. However, I was corrected by my professor Marc Meredith (sidenote: he calls MRP, "Mr. P" without irony), who noted that the key to having high ICC is variation between groups (or regions in the case of US States) in addition to between individuals. This led to create the following hypothetical, and accompanying graphs to prove the point.
Say there is a poll that questions if the public approves of the maple syrup industry and the champagne industry. Furthermore assume that public approval for both is roughly even near 50%. Despite these both being delicious items, some states like each product more than others. Champagne is French, so no region is particularly wild about it. The make-believe country looks like this.
Champagne is also kind of fancy, so maybe some states like it slightly better than others (RI would prefer it to the more rural Maine for instance), but there's no real pattern across a region like New England. 
Therefore, there is low variation between regions, and high variation within regions. So in total, approval for the champagne industry has LOW ICC. 
Now, maple syrup is a different story! Because it is so tied to cold climates, and marketed as emanating from a single state (Vermont), there is a major variation in how different regions in the US approve of the maple syrup industry. Regions closer to Vermont like it much more, especially in New England.  
And because New Englanders stick together on important issues like this (except for the Yankee fans in Connecticut), most of the New England feels about the same about it. 
In the hypothetical maple syrup case, ICC is high.
How does this all matter? In the case of MRP, intraclass correlation is a good thing. However, in typical econometrics, it's a matter for concern (called the Moulton problem or clustering problem, it usually requires the use of robust standard errors). And like in Downton Abbey, it's a reminder that not all the interesting action is within the classes. 




 
 
