Overview

This tutorial provides a guide for calculating diversity-type intraindividual variability (IIV) constructs, such as emodiversity using intensive longitudinal data (e.g. daily diary, ecological momentary assessment, experience sampling, etc). Examples of diversity-type IIV constructs include, but are not limited to, emodiversity, social diversity, activity diversity, and stressor diversity. Please see the corresponding paper “Fusing biodiversity metrics into investigations of daily life and healthy aging: Illustrations and recommendations with emodiversity” (Benson, Ram, Almeida, Zautra, & Ong, under review).

Outline

  1. Data preparation
  2. Select items
  3. Select measurement scale (continuous or binary)
  4. Aggregate data
  5. Choose diversity metric (e.g. Gini coefficient, Shannon’s entropy, Simpson’s index) & calculate diversity scores
  6. Plot emodiversity

Step 0: Data Preparation

For the purposes of this guide, we utilize a toy dataset including design variables (id and time) and 10 emotion variables:

Generate id and time data:

id <- sort(rep(seq(1,150),30))
time <- rep(seq(1,30),150)

design <- data.frame(id,time)

Generate emotion data:

enthusiastic <- round(rnorm(4500,mean=2,sd=1),0)
amused <- round(rnorm(4500,mean=2,sd=1),0)
proud <- round(rnorm(4500,mean=-1,sd=1),0)
interested <- round(rnorm(4500,mean=2,sd=1),0)
calm <- round(rnorm(4500,mean=2,sd=1),0)
sad <- round(rnorm(4500,mean=2,sd=1),0)
ashamed <- round(rnorm(4500,mean=2,sd=1),0)
guilty <- round(rnorm(4500,mean=2,sd=1),0)
hostile <- round(rnorm(4500,mean=2,sd=1),0)
nervous <- round(rnorm(4500,mean=-1,sd=1),0)

emotion <- data.frame(enthusiastic, amused, proud, interested, calm, 
                      sad, ashamed, guilty, hostile, nervous)

emotion[emotion < 1 | emotion > 5] <- 1

Create data frame with the design and emotion items:

df1 <- data.frame(design,emotion)

Plot the time series for one participant:

# Subset to get data for just participant #1
    id1 <- df1[which(df1$id==1),]

# Reshape from long format to wide format
    #install.packages("reshape")
    library(reshape)

    # Melt datasets 
        id1_melt <- melt(id1,id=c("id","time"))
  
    # Rename variables
        names(id1_melt) <- c("id","Time","EmotionLabel","EmotionValue")
        
# Plot
    #install.packages("ggplot2")
    library(ggplot2)
    ggplot(id1_melt, aes(x = Time, y = EmotionValue, group = EmotionLabel, colour = EmotionLabel)) + 
        geom_line()


Step 1: Select Items

First, select the items that you will use to calculate diversity.

For the purposes of this demonstration, we utilize:

It is also possible to use other types of categories including (but not limited to) various types of stressors (e.g. work, family) or various types of social relationships (e.g. family member, friend, coworker).


Step 2: Measurement Scale

Second, decide what type of measurement scale to use.

Two common options are continuous Likert-type (e.g. 1 to 5) and binary (e.g. “yes”/no“) measurement scales.

For the purposes of this tutorial, we demonstrate how to utilize both the continuous and binary scale options, based on the original Likert-type 1 to 5 scale that was used in the study design.

To recode the items, we use the recode function in the car package.

#install.packages("car")
library(car)

For the continuous scale option, we need to recode the items so that the minimum scale value is zero (rather than 1 in this case). This is necessary so that when we aggregate the data, responses corresponding to “none of the time” are not counted as emotion experiences.

vars_to_change <- c('enthusiastic', 'amused', 'proud', 'interested', 'calm',
                    'sad', 'ashamed', 'guilty', 'hostile', 'nervous')

    for(x in 1:length(vars_to_change)){
        curr_var = vars_to_change[x]
        new_var = paste(c(curr_var, "_c"),collapse='')
        df1[,new_var] = recode(df1[,curr_var],"NA=0;1=0;2=1;3=2;4=3;5=4") 
        # making assumption that NA means they didn't experience the emotion
    }

For the binary scale option, we also need to recode items, if they were originally measured on a likert-type scale. It is up to the researcher on where to make the cut between absence and presence of a certain experience. For the purposes of this example, we recode values of 1 to 0 and values of 2-5 to 1.

vars_to_change <- c('enthusiastic', 'amused', 'proud', 'interested', 'calm',
                    'sad', 'ashamed', 'guilty', 'hostile', 'nervous')

    for(x in 1:length(vars_to_change)){
        curr_var = vars_to_change[x]
        new_var = paste(c(curr_var, "_b"),collapse='')
        df1[,new_var] = recode(df1[,curr_var],"NA=0;1=0;2=1;3=1;4=1;5=1") 
        # making assumption that NA means they didn't experience the emotion
    }

Similarly, for data that are on a continuous scale, such as 0-100, the researcher will need to decide what if anything needs to be done with the scale. For research questions examining the diversity in the intensity of emotion experiences, maintaining this 0-100 scale is approporiate. For research questions examining the diversity in emotion experiences, some type of re-coding will need to be done so that certain values in the range correspond to “absence/0” and the other values correspond to “presence/1”.

Note on missing data: if there are completely missing rows of relevant data for a participant on a given occasion, these should be removed. If missing values occur for some items, but not others for a given occasion, possibilities include either removing the entire row of data or recoding missing values to 0, which makes the assumption that data are missing for this item because it wasn’t experienced on that particular occasion.


Step 3: Compute Frequencies across Occasions

Third, aggregate the multi-occasion data into counts over time for each item.

For this example, we utilize the binary scale items, but the process is the same when using the continuous scale variables. There are many ways to aggregate data, and for this example we chose to use the ddply function within the plyr package.

#install.packages("plyr")
library(plyr)

df2 <- ddply(df1, "id", summarize,
              enthusiastic = sum(enthusiastic_b, na.rm = TRUE),
              amused = sum(amused_b, na.rm = TRUE),
              proud = sum(proud_b, na.rm = TRUE),
              interested = sum(interested_b, na.rm = TRUE),
              calm = sum(calm_b, na.rm = TRUE),
              sad = sum(sad_b, na.rm = TRUE),
              ashamed = sum(ashamed_b, na.rm = TRUE),
              guilty = sum(guilty_b, na.rm = TRUE),
              hostile = sum(hostile_b, na.rm = TRUE),
              nervous = sum(nervous_b, na.rm = TRUE))

Step 4: Choose Diversity Metric & Calculate Diversity Scores

Fourth, select a diversity metric to use.

There are many available diversity metrics, most of which are similar to some extent, with nuance as to which may be more or less useful in particular contexts.

Two options to aid in this choice include calculating a series of indices and looking at both the correlations among indices as well as comparing their distributional features.

Four potential diversity metrics include the Gini coefficient, richness, Shannon’s entropy, and Simpson’s index.

For this example, we calculate and compare all four of diversity metrics:

The Gini function within the ineq package is used to calculate the Gini coefficient.

#install.packages("ineq")
library(ineq)

The diversity function within the vegan package is used to calculate Shannon’s entropy and Simpson’s index.

#install.packages("vegan")
library(vegan)

Calculate diversity scores using the Gini coefficient:

\(GiniDiversity_{i} = G_{i} = 1 - \left ( \left( \frac{2\sum_{j=1}^m jc_{ij}}{m\sum_{j=1}^mc_{ij}} \right) - \frac{m+1}{m} \right)\)

where \(c_{ij}\) is the count of individual i’s experiences within j = 1 to m categories (e.g. species, emotion types) indexed in non-decreasing order \((c_{ij} < c_{ij+1})\). Diversity scores range from 0 to 1, with higher values indicating more “equitable” or diverse (emotional) ecosystems.

Calculate positive and negative emodiversity:

vars <- 3 # 1 column for id, 1 column for positive emodiversity, and 1 column for negative emodiversity 
giniDat <- matrix(data=NA, nrow=length(df2$id), ncol=vars)
idlist <- df2$id

posEmotions <- c("enthusiastic","amused","proud","interested","calm")
negEmotions <- c("sad","ashamed","guilty","hostile","nervous")

sets <- list(posEmotions=posEmotions, negEmotions=negEmotions)

    for(i in 1:length(idlist)){
        for(x in 1:length(sets)){
        curr_set = unlist(sets[x])
        data <- df2[i,curr_set]
        giniDat[i,1] <- idlist[i]
        giniDat[i,x+1] <- 1-(Gini(data,na.rm=TRUE))
        }
    }

giniDat <- data.frame(giniDat)
names(giniDat) <- c("id","pos_emodiv_g","neg_emodiv_g")
df3 <- merge(df2,giniDat, by="id")

Richness:

\(RichnessDiversity_{i} = R_{i}\)

where \(R_{i}\) is the number of j = 1 to m categories.

Calculate positive and negative emodiversity:

df4 <- matrix(data=NA,nrow=150,ncol=3)
idlist <- df3$id
sets <- list(posEmotions=posEmotions, negEmotions=negEmotions)
    
    for(i in 1:length(idlist)){
        for(x in 1:length(sets)){
        curr_set = unlist(sets[x])
        data <- df3[i,curr_set]
        df4[i,1] <- idlist[i]
        df4[i,x+1] <- length(which(data >=1))
        }
    }
    
# Set variable names
    colnames(df4) <- c("id","pos_emodiv_r","neg_emodiv_r")

# Convert to data frame object
    df4 <- data.frame(df4)
    
# Combine with other data
    df5 <- merge(df3,df4,by=c("id"), all=TRUE)

Shannon’s entropy:

\(ShannonDiversity_{i} = H_{i} = - \sum_{j=1}^m p_{ij}lnp_{ij}\)

where m is the number of discrete emotion types (i.e. angry, sad, enthusiastic), and \(p_{ij}\) is the proportion of individual i’s experiences that were of each discrete emotion type, j = 1 to m. Scores can range from 0 to ln(m), with higher scores indicating greater diversity (richness and evenness).

Calculate positive emodiversity:

df5$pos_emodiv_h <- diversity(df5[posEmotions], index = "shannon")

Calculate negative emodiversity:

df5$neg_emodiv_h <- diversity(df5[negEmotions], index = "shannon")

Simpson’s index:

Simpson’s index has two parameterizations that, although scaled differently, may be interpreted as the probability that any two randomly selected experiences are of different emotion types. The inverse version is calculated as,

\(SimpsonDiversity_{i} = D_{i} = 1 - \sum_{j=1}^m p_{ij}^2\)

and ranges from 0 to 1. Higher scores indicate greater diversity (richness and evenness) of emotional experiences.

Calculate positive emodiversity:

df5$pos_emodiv_d <- diversity(df5[posEmotions], index = "simpson")

Calculate negative emodiversity:

df5$neg_emodiv_d <- diversity(df5[negEmotions], index = "simpson")

Descriptives:

Calculate the descriptives for the positive and negative emodiversity indices:

#install.packages("psych")
library(psych)

describe(df5[,-1])
##              vars   n  mean   sd median trimmed  mad   min   max range
## enthusiastic    1 150 20.19 2.54  20.00   20.23 2.97 12.00 26.00 14.00
## amused          2 150 20.72 2.44  21.00   20.74 2.97 15.00 26.00 11.00
## proud           3 150  0.12 0.35   0.00    0.02 0.00  0.00  2.00  2.00
## interested      4 150 20.63 2.46  21.00   20.59 2.97 15.00 27.00 12.00
## calm            5 150 21.00 2.60  21.00   21.03 2.97 13.00 27.00 14.00
## sad             6 150 20.67 2.26  20.50   20.68 2.22 14.00 27.00 13.00
## ashamed         7 150 20.31 2.39  20.00   20.34 2.97 14.00 26.00 12.00
## guilty          8 150 20.81 2.37  21.00   20.93 2.97 13.00 25.00 12.00
## hostile         9 150 20.63 2.67  21.00   20.69 2.97 13.00 27.00 14.00
## nervous        10 150  0.19 0.41   0.00    0.10 0.00  0.00  2.00  2.00
## pos_emodiv_g   11 150  0.76 0.02   0.76    0.76 0.02  0.71  0.80  0.09
## neg_emodiv_g   12 150  0.76 0.02   0.76    0.76 0.02  0.70  0.80  0.10
## pos_emodiv_r   13 150  4.11 0.32   4.00    4.02 0.00  4.00  5.00  1.00
## neg_emodiv_r   14 150  4.18 0.39   4.00    4.10 0.00  4.00  5.00  1.00
## pos_emodiv_h   15 150  1.39 0.02   1.38    1.38 0.00  1.36  1.46  0.10
## neg_emodiv_h   16 150  1.39 0.02   1.38    1.39 0.00  1.36  1.44  0.08
## pos_emodiv_d   17 150  0.75 0.00   0.75    0.75 0.00  0.74  0.76  0.02
## neg_emodiv_d   18 150  0.75 0.00   0.75    0.75 0.00  0.74  0.76  0.02
##               skew kurtosis   se
## enthusiastic -0.21     0.13 0.21
## amused       -0.07    -0.42 0.20
## proud         2.79     7.28 0.03
## interested    0.14    -0.31 0.20
## calm         -0.18     0.05 0.21
## sad          -0.04     0.37 0.18
## ashamed      -0.21    -0.02 0.19
## guilty       -0.55     0.05 0.19
## hostile      -0.17    -0.06 0.22
## nervous       1.88     2.39 0.03
## pos_emodiv_g -0.26    -0.37 0.00
## neg_emodiv_g -0.50     0.02 0.00
## pos_emodiv_r  2.42     3.86 0.03
## neg_emodiv_r  1.65     0.72 0.03
## pos_emodiv_h  2.24     4.34 0.00
## neg_emodiv_h  1.46     0.56 0.00
## pos_emodiv_d  0.20     1.91 0.00
## neg_emodiv_d -0.01     0.38 0.00

Correlations:

Calculate correlations among the positive and negative emodiversity indices to look at similarities/differences depending on which metric (e.g. Gini coefficient vs. Shannon’s entropy) is used:

round(cor(df5[,c("pos_emodiv_g","pos_emodiv_r","pos_emodiv_h","pos_emodiv_d",
                 "neg_emodiv_g","neg_emodiv_r","neg_emodiv_h","neg_emodiv_d")]),2)
##              pos_emodiv_g pos_emodiv_r pos_emodiv_h pos_emodiv_d
## pos_emodiv_g         1.00         0.23         0.47         0.85
## pos_emodiv_r         0.23         1.00         0.95         0.66
## pos_emodiv_h         0.47         0.95         1.00         0.84
## pos_emodiv_d         0.85         0.66         0.84         1.00
## neg_emodiv_g        -0.06         0.10         0.07         0.01
## neg_emodiv_r         0.11        -0.06        -0.02         0.06
## neg_emodiv_h         0.09        -0.03         0.00         0.06
## neg_emodiv_d         0.01         0.05         0.05         0.04
##              neg_emodiv_g neg_emodiv_r neg_emodiv_h neg_emodiv_d
## pos_emodiv_g        -0.06         0.11         0.09         0.01
## pos_emodiv_r         0.10        -0.06        -0.03         0.05
## pos_emodiv_h         0.07        -0.02         0.00         0.05
## pos_emodiv_d         0.01         0.06         0.06         0.04
## neg_emodiv_g         1.00         0.35         0.53         0.86
## neg_emodiv_r         0.35         1.00         0.97         0.75
## neg_emodiv_h         0.53         0.97         1.00         0.87
## neg_emodiv_d         0.86         0.75         0.87         1.00

Step 5: Plot Emodiversity

Plot Distributions

Positive Emodiversity (Gini coefficient, richness, Shannon’s entropy, Simpson’s index):

ggplot(df5,aes(x=pos_emodiv_g)) +
  geom_histogram(colour="black",fill="white") +
  xlab("Gini Coefficient (Positive)") + ylab("Count")

ggplot(df5,aes(x=pos_emodiv_r)) +
  geom_bar(colour="black",fill="white") +
  scale_x_discrete(limits=c(4,5)) +
  xlab("Richness (Positive)") + ylab("Count")

ggplot(df5,aes(x=pos_emodiv_h)) +
  geom_histogram(colour="black",fill="white") +
  xlab("Shannon's Entropy (Positive)") + ylab("Count")

ggplot(df5,aes(x=pos_emodiv_d)) +
  geom_histogram(colour="black",fill="white") +
  xlab("Simpson's Index (Positive)") + ylab("Count")

Negative Emodiversity (Gini coefficient, richness, Shannon’s entropy, Simpson’s index):

ggplot(df5,aes(x=neg_emodiv_g)) +
  geom_histogram(colour="black",fill="white") +
  xlab("Gini Coefficient (Negative)") + ylab("Count")

ggplot(df5,aes(x=neg_emodiv_r)) +
  geom_bar(colour="black",fill="white") +
  scale_x_discrete(limits=c(4,5)) +
  xlab("Simpson's Reciprocal (Negative)") + ylab("Count")

ggplot(df5,aes(x=neg_emodiv_h)) +
  geom_histogram(colour="black",fill="white") +
  xlab("Shannon's Entropy (Negative)") + ylab("Count")

ggplot(df5,aes(x=neg_emodiv_d)) +
  geom_histogram(colour="black",fill="white") +
  xlab("Simpson's Index (Negative)") + ylab("Count")

Plot the Emotional Ecosystem for each Individual

Set-up code:

vars_to_change <- c("enthusiastic_c","amused_c","proud_c","interested_c","calm_c",
                    "sad_c","ashamed_c","guilty_c","hostile_c","nervous_c")

    for(x in 1:length(vars_to_change)){
        curr_var = vars_to_change[x]
        new_var = paste(c(curr_var, "_cat"),collapse='')
        df1[,new_var] = recode(df1[,curr_var],"0=NA;1:4=x") 
    }

# Sets
    emotions_c <- c("id","enthusiastic_c","amused_c","proud_c","interested_c","calm_c",
                    "sad_c","ashamed_c","guilty_c","hostile_c","nervous_c") 
    
    emotions_cat <- c("id", "enthusiastic_c_cat", "amused_c_cat", "proud_c_cat", 
                      "interested_c_cat", "calm_c_cat", "sad_c_cat", "ashamed_c_cat", 
                      "guilty_c_cat", "hostile_c_cat", "nervous_c_cat")
    
# Melt datasets 
  emotions_c_melt <- melt(df1[emotions_c],id=c("id"))
  emotions_cat_melt <- melt(df1[emotions_cat],id=c("id"))
  
# Rename variables
  names(emotions_c_melt) <- c("id","EmotionLabel","EmotionValue")
  names(emotions_cat_melt) <- c("id","CategoryLabel","CategoryValue")
  
# Combine 
  emotions <- cbind(emotions_c_melt,emotions_cat_melt[2:3])
  
# Remove missing data 
  emotions2 <- emotions[complete.cases(emotions),]
  
# Add a zero row for each emotion and and each participant (i.e. 10 new rows per participant) so that all spokes are printed
  idlist <- unique(emotions$id); idlist2 <- rep.int(idlist,10); id <- sort(idlist2)
  
  catLab <- c("enthusiastic_cat","amused_cat","proud_cat","interested_cat","calm_cat",
              "sad_cat","ashamed_cat","guilty_cat","hostile_cat","nervous_cat")
  
  CategoryLabel <- rep(catLab,length(idlist))
  
  catVal <- c(1:10); CategoryValue <- rep(catVal,length(idlist))
  
  emoLab <- c("enthusiastic_c","amused_c","proud_c","interested_c","calm_c",
              "sad_c","ashamed_c","guilty_c","hostile_c","nervous_c")
  
  EmotionLabel <- rep(emoLab,length(idlist))
  
  EmotionValue <- rep(0,1500) # number of id's * number of emotion types (150*10 = 1500)
  
  newdat <- cbind(id,EmotionLabel,EmotionValue,CategoryLabel,CategoryValue)
  
  emotions3 <- rbind(emotions2,newdat)
  
# Factor variables
  emotions3$CategoryValue <- factor(emotions3$CategoryValue,levels = c(1:10), 
                                labels = c("Enthusiastic", "Amused", "Proud", "Interested", "Calm",
                                           "Sad", "Ashamed", "Guilty", "Hostile", "Nervous"))
  
# Color palette for the radar plots
    palette <- c("#1D0047","#0A0060","#540041","#A70046","#C40060") 

Example radar plot code for one participant:

ggplot(emotions3[which(emotions3$id==1),], aes(x = CategoryValue, fill = factor(EmotionValue))) +
            geom_bar(width = .8) + 
            coord_polar() + 
            #ylim(0,14) + 
            scale_color_manual(values=palette) +
            scale_fill_manual(values=palette) +
            theme(axis.title=element_blank(),
                axis.text.x=element_text(colour="black",size=10),
                axis.text.y=element_blank(),
                axis.ticks=element_blank(),
                legend.position="none",
                panel.grid.major=element_line(colour="grey89"),
                panel.grid.minor=element_line(colour="grey89"),
                panel.background=element_rect(fill="white"))


Cautions

In cases where the total number of emotion experiences across all occasions was zero, the individual’s diversity score was treated as missing.

In their analysis of various diversity metrics, Benson et al (under review) showed that of the 8 distributions, 7 were highly skewed and only the Gini coefficient for negative emodiversity resembled a relatively normal distribution. This is especially important to keep in mind when using emodiversity as an outcome variable in a regression-based analysis, where the assumption of normally distributed scores is present.

Conclusions

This tutorial provides instruction on how to calculate diversity, with emodiversity used as the focal example. Code is presented for four diversity metrics. Please reference Benson et al (under review) for further information on how to select a diversity metric based on the theoretical conception of diversity (evenness, richness, or both), practicalities of the study design, and the scaling and distributional properties of diversity scores.

Overall, emodiversity (and other diversity-based IIV constructs) is a conceptually meaningful indicator of individuals’ emotion experiences as they relate to physical health. We hope future research continues to expand the study of diversity-type IIV constructs into other domains where discrete types of feelings and behaviors manifest (e.g., types of stressors, types of activities, types of locations, types of social partners), to better understand how the range of experiences an individual encounters contributes to health, well-being and successful aging.