Plotting Games-Howell ANOVA post hoc differences in R

I am a regular reader of the Statistics by Jim blog and the motivation for my post stems from a comment I answered on one of his blog posts.

Background: Classic One-Way, Welch ANOVA, and post hoc tests

One of his recent posts discussed using a Welch’s ANOVA to the ‘Classic’ One-Way ANOVA. For those less familiar, the classic One-Way Analysis of Variance (ANOVA) allows us to compare the means of three or more groups. It uses Fisher’s F-test to determine if the mean of one of those groups is statistically different from the other means. Like the Student’s t-test, there is an assumption that each group has equal variance. Like Welch’s t-test, Welch’s ANOVA relaxes the equal variance assumption. In his blog post, Jim goes into greater detail and example on how the Welch ANOVA is less likely to produce Type I errors when there is unequal variance.

Post hoc test allow us a closer look at the pairwise differences between group means. There are many different post hoc tests, each with their own nuances such as how conservatively they adjust the pairwise p-values. Among the most popular post hoc tests for the classic One-Way ANOVA is Tukey’s HSD (honest significant difference).

For the Welch ANOVA, Jim opted to use the Games-Howell post hoc test. It is similar to the Tukey HSD, but unlike Tukey HSD, it does not assume groups to have the same standard deviation. This relaxed assumption goes well with the Welch ANOVA.

Reading post hoc results from raw software output is fine, but plotting the results can be much more visually appealing and help quickly sift through the results. Jim used Minitab to create his Games-Howell post hoc plot and was unsure how to help someone that asked him how to create the plot in R. I didn’t know how either, but it fired up my curiosity. Here are my results.

Creating a Games-Howell post hoc plot in R

Plotting the post hoc for a Tukey HSD is simple. Just run save the results of your Classic One-Way ANOVA as an object and place it inside of an Tukey HSD function, which can be nested into a plot function, like so:

aov.classic <- aov(DV~IV, data=dataset)
plot(TukeyHSD(aov.classic))
# alternatively save the Tukey results as an object and plot it
aov.classic.tukey <- TukeyHSD(aov.classic)
plot(aov.classic.tukey)

I wasn’t able to find a similar plot option for the Games-Howell post hoc results, but I’ll come to that later.

First, I had to install the userfriendlyscience package to access the posthocTGH function: https://rpubs.com/aaronsc32/games-howell-test. The posthocTGH reports the frequency, mean, and variance for each group. Saving the results of that function as an object (here as gh) provides the following attributes.

gh <- posthocTGH(y=dataset$DV, x=dataset$IV)
attributes(gh)
$names
[1] "input"        "intermediate" "output"      
$class
[1] "posthocTGH"

Digging a bit more into the attributes for each of the names yields some interesting data for later use.

attributes(gh$input)
$names
[1] "y"            "x"           
[3] "method"       "conf.level"  
[5] "digits"       "p.adjust"    
[7] "formatPvalue"
attributes(gh$intermediate)
$names
 [1] "x"               "y"              
 [3] "n"               "groups"         
 [5] "df"              "means"          
 [7] "variances"       "names"          
 [9] "pairNames"       "descriptives"   
[11] "errorVariance"   "se"             
[13] "dmeans"          "t"              
[15] "p.tukey"         "alpha"          
[17] "qcrit"           "tukey.low"      
[19] "tukey.high"      "df.corrected"   
[21] "se.corrected"    "t.corrected"    
[23] "qcrit.corrected" "gh.low"         
[25] "gh.high"         "p.gameshowell"  
attributes(gh$output)
$names
[1] "tukey"        "games.howell"

Now that I have the Games-Howell results, I found & tweaked a TukeyHSD function code to work with that post hoc. Note: TukeyHSD function is in the stats package that typically loads with R at the start. The plotTukeyHSD function by Nathan Brouwern that I modified appears to offer a very similar plot. I obtained the TukeyHSD code from: https://rpubs.com/brouwern/plotTukeyHSD2 Below is are the first few lines of code that needed modification.

#### The function STARTS here ####
plotTukeyHSD <- plotTukeysHSD <- function(tukey.out,
                           x.axis.label = "Comparison",
                           y.axis.label = "Effect Size",
                           axis.adjust = 0,
                           adjust.x.spacing = 5){
  
       tukey.out <- as.data.frame(tukey.out[[1]])
       means <- tukey.out$diff
       categories <- row.names(tukey.out)
       groups <- length(categories)
       ci.low <- tukey.out$lwr
       ci.up  <- tukey.out$upr   

The code snippet above uses a object defined tukey.out which contains the stored results of a TukeyHSD function. I modified the snippet to instead use the saved posthocTGH results, defined as object ght.out, and use extracted pair-names, means, and CI values.

#### The function STARTS here ####
plotGamesHowell <- function(ght.out,
                           x.axis.label = "Comparison",
                           y.axis.label = "Effect Size",
                       axis.adjust = 0,
                       adjust.x.spacing = 5){
  
       #ght.out <- as.data.frame(ght.out[[1]]) # not used 
       means <- ght.out$intermediate$dmeans
       categories <- ght.out$intermediate$pairNames
       groups <- length(categories)
       ci.low <- ght.out$intermediate$gh.low
       ci.up  <- ght.out$intermediate$gh.high 

Here is a link to my modified code in Github: https://github.com/shawnjanzen/plot-for-games-howell-anova-post-hoc-test/blob/master/games-howell-post-hoc-plot-custom-function.R

Visualizing a Games-Howell post hoc plot in R

Let’s use the trusty (and lovingly overused) mtcars dataset built into R to help with an example. Since this is an ANOVA, let’s compare if cars differ on average by their quarter-mile time by the number of cylinders. Cylinders is stored numerically, so I will need to treat it as a factor in the ANOVA.

library(userfriendlyscience) # activate package
data(mtcars)  # load data
str(subset(mtcars, select=c(cyl, qsec)))  # examine variables
'data.frame':	32 obs. of  2 variables:
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...

As usual, I like to take quick look at the summary stats before diving into an analysis. There are an uneven number of observations in each cylinder group, but not too awful since the smallest (7) is half as many as the largest (14).

library(summarytools)
freq(mtcars$cyl)
descr(mtcars$qsec, transpose=TRUE, stats="common")
cbind(var=round(by(data=mtcars$qsec, INDICES=mtcars$cyl, FUN=var, na.rm=TRUE),2),
      sd=round(by(data=mtcars$qsec, INDICES=mtcars$cyl, FUN=sd, na.rm=TRUE),2))
Frequencies  
mtcars$cyl  
Type: Numeric  

              Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
----------- ------ --------- -------------- --------- --------------
          4     11     34.38          34.38     34.38          34.38
          6      7     21.88          56.25     21.88          56.25
          8     14     43.75         100.00     43.75         100.00
       <NA>      0                               0.00         100.00
      Total     32    100.00         100.00    100.00         100.00
Descriptive Statistics  
mtcars$qsec  
N: 32  

              Mean   Std.Dev     Min   Median     Max   N.Valid   Pct.Valid
---------- ------- --------- ------- -------- ------- --------- -----------
      qsec   17.85      1.79   14.50    17.71   22.90     32.00      100.00
   var   sd
4 2.83 1.68
6 2.91 1.71
8 1.43 1.20

Eyeballing the variance & standard deviations above, they look different but a Brown-Forsythe test shows they are not different enough to say the variances are not equal. So I could use the classic One-Way ANOVA & Tukey HSD here, but I’ll continue to showcase the Welch ANOVA & Games-Howell post hoc and report both post hoc plots.

library(car)
leveneTest(qsec ~ as.factor(cyl), data=mtcars, center=mean)
Levene's Test for Homogeneity of Variance (center = mean)
      Df F value Pr(>F)
group  2  0.6759 0.5165
      29 

First the results from the Classic One-Way ANOVA and then the Welch ANOVA, both using functions from the stats package.

aov.classic <- aov(qsec ~ as.factor(cyl), data=mtcars)
summary(aov.classic)
               Df Sum Sq Mean Sq F value  Pr(>F)   
as.factor(cyl)  2  34.61   17.30   7.794 0.00196 **
Residuals      29  64.38    2.22                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
oneway.test(qsec ~ as.factor(cyl), data=mtcars, var.equal=FALSE)
	One-way analysis of means (not assuming equal variances)

data:  qsec and as.factor(cyl)
F = 7.7044, num df = 2.000, denom df = 14.047, p-value = 0.005512

As expected, the results are essentially the same, but let’s jump to the post hoc tests. Reminder, posthocTGH is from the userfriendlyscience package.

Tangent: Also part of the userfriendlyscience package is a oneway function that gives the ANVOA summary results, effect size, and a more detailed post hoc in a single function than the two usual summary of aov and TukeyHSD functions. According to the help documentation, the oneway function allows post hoc options for: Tukey, Games-Howell, Holm, Hochberg, Hommel, Bonferroni, Benjamin-Hochberg (BH), Benjamini & Yekutieli (BY), False Discovery Rate (FDR), or none. The posthocTGH will only do Games-Howell or Tukey.

TukeyHSD(aov.classic)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = qsec ~ as.factor(cyl), data = mtcars)

$`as.factor(cyl)`
        diff       lwr        upr     p adj
6-4 -1.16013 -2.939271  0.6190113 0.2574564
8-4 -2.36513 -3.847748 -0.8825122 0.0013300
8-6 -1.20500 -2.908398  0.4983980 0.2053353
gh.test <- posthocTGH(y=mtcars$qsec, x=as.factor(mtcars$cyl), method="games-howell")
gh.test
   n means variances
4 11    19       2.8
6  7    18       2.9
8 14    17       1.4

    diff ci.lo ci.hi   t   df    p
6-4 -1.2  -3.3  1.01 1.4 12.8  .36
8-4 -2.4  -3.9 -0.83 3.9 17.4 <.01
8-6 -1.2  -3.2  0.80 1.7  9.1  .27

Visualizing these as post hoc plots.

plot(TukeyHSD(aov.classic), las=1) # rotate axis labels
Tukey HSD post hoc plot
plotGamesHowell(gh.test)
Games-Howell post hoc plot

And there we go. A new plot for the Games-Howell post hoc test results derived from modifying a Tukey HSD plot. Thanks to those that have come before me and made your code available to everyone.

Empowering your intern & job search with informational interviews

Many students ask me for help with finding internships and jobs. Most of the time, they specifically ask about resume reviews and job leads. Getting another set of eyes on your resume and portfolios is important, and I’m happy to help when I can. However, I cannot stress the importance of taking self-empowering actions that can dramatically help boost your intern & job search experience. One way to do this is through informational interviews.

Although this post focuses on informational interviews, many of the search and engagement methods I suggest are applicable to job searches.

What is an informational interview?

Most of us are familiar with the traditional job interview, where you meet with one or more people interested in determining if you are the right person for the job. In the job interview, you are in the hot seat and most of the questions are directed at you to help assess your KSAs*, competence, attitude, and other types of fit with the job and organization.
*KSAs = knowledge, skills, & abilities

Continue reading “Empowering your intern & job search with informational interviews”

Who is the Social State?

If you ever want to catch my ear, just start talking about government use of social media.  It is a fascinating cornucopia of opportunities for applied research and practical policy work.  This thought-provoking post by Katherine Barrett & Richard Greene over on Governing motivated me to post my first blog topic, to continue this conversation into one of my areas of interest–the person(s) and policies behind the profile.  I’ll keep this relatively short, more questions than answers, something get my thoughts out now and revisit later.

Any technology potent enough to be interesting will inevitably destabilize existing institutions, power relationships, social structures, reigning economic and technological systems, and cultural assumptions.” – Brad Allenby [1]

Social media and social networks existed long before the Internet.  However, the long arm of the Internet’s reach and speed has given rise to communities of digital media unlike anything we’ve ever known.  Seven in ten people in the US use social media, 8 in 10 people age 49 and under.  Government has also jumped on this ubiquitous communication train.  Public administrators and legislators at every level engage the public, stakeholders, and each other.

Some in government hopped onto social media platforms as a natural extension of their duties, such as with legislators or administrators within public-facing offices.  Others saw social media as “part of the presidential mandate: ‘We had to use social media to accomplish the goals of the Open Government and Transparency initiative.’” [2]  When done right, social media allows members of government to increase transparency, accountability, and increase civic engagement through greater public consultation and participatory democracy.  But what does it mean to do it right?

Continue reading “Who is the Social State?”