This tutorial provides a guide for calculating diversity-type intraindividual variability (IIV) constructs, such as emodiversity using intensive longitudinal data (e.g. daily diary, ecological momentary assessment, experience sampling, etc). Examples of diversity-type IIV constructs include, but are not limited to, emodiversity, social diversity, activity diversity, and stressor diversity. Please see the corresponding paper “Fusing biodiversity metrics into investigations of daily life and healthy aging: Illustrations and recommendations with emodiversity” (Benson, Ram, Almeida, Zautra, & Ong, under review).
For the purposes of this guide, we utilize a toy dataset including design variables (id and time) and 10 emotion variables:
Generate id and time data:
id <- sort(rep(seq(1,150),30))
time <- rep(seq(1,30),150)
design <- data.frame(id,time)
Generate emotion data:
enthusiastic <- round(rnorm(4500,mean=2,sd=1),0)
amused <- round(rnorm(4500,mean=2,sd=1),0)
proud <- round(rnorm(4500,mean=-1,sd=1),0)
interested <- round(rnorm(4500,mean=2,sd=1),0)
calm <- round(rnorm(4500,mean=2,sd=1),0)
sad <- round(rnorm(4500,mean=2,sd=1),0)
ashamed <- round(rnorm(4500,mean=2,sd=1),0)
guilty <- round(rnorm(4500,mean=2,sd=1),0)
hostile <- round(rnorm(4500,mean=2,sd=1),0)
nervous <- round(rnorm(4500,mean=-1,sd=1),0)
emotion <- data.frame(enthusiastic, amused, proud, interested, calm,
sad, ashamed, guilty, hostile, nervous)
emotion[emotion < 1 | emotion > 5] <- 1
Create data frame with the design and emotion items:
df1 <- data.frame(design,emotion)
Plot the time series for one participant:
# Subset to get data for just participant #1
id1 <- df1[which(df1$id==1),]
# Reshape from long format to wide format
#install.packages("reshape")
library(reshape)
# Melt datasets
id1_melt <- melt(id1,id=c("id","time"))
# Rename variables
names(id1_melt) <- c("id","Time","EmotionLabel","EmotionValue")
# Plot
#install.packages("ggplot2")
library(ggplot2)
ggplot(id1_melt, aes(x = Time, y = EmotionValue, group = EmotionLabel, colour = EmotionLabel)) +
geom_line()
First, select the items that you will use to calculate diversity.
For the purposes of this demonstration, we utilize:
It is also possible to use other types of categories including (but not limited to) various types of stressors (e.g. work, family) or various types of social relationships (e.g. family member, friend, coworker).
Second, decide what type of measurement scale to use.
Two common options are continuous Likert-type (e.g. 1 to 5) and binary (e.g. “yes”/no“) measurement scales.
For the purposes of this tutorial, we demonstrate how to utilize both the continuous and binary scale options, based on the original Likert-type 1 to 5 scale that was used in the study design.
To recode the items, we use the recode
function in the car
package.
#install.packages("car")
library(car)
For the continuous scale option, we need to recode the items so that the minimum scale value is zero (rather than 1 in this case). This is necessary so that when we aggregate the data, responses corresponding to “none of the time” are not counted as emotion experiences.
vars_to_change <- c('enthusiastic', 'amused', 'proud', 'interested', 'calm',
'sad', 'ashamed', 'guilty', 'hostile', 'nervous')
for(x in 1:length(vars_to_change)){
curr_var = vars_to_change[x]
new_var = paste(c(curr_var, "_c"),collapse='')
df1[,new_var] = recode(df1[,curr_var],"NA=0;1=0;2=1;3=2;4=3;5=4")
# making assumption that NA means they didn't experience the emotion
}
For the binary scale option, we also need to recode items, if they were originally measured on a likert-type scale. It is up to the researcher on where to make the cut between absence and presence of a certain experience. For the purposes of this example, we recode values of 1 to 0 and values of 2-5 to 1.
vars_to_change <- c('enthusiastic', 'amused', 'proud', 'interested', 'calm',
'sad', 'ashamed', 'guilty', 'hostile', 'nervous')
for(x in 1:length(vars_to_change)){
curr_var = vars_to_change[x]
new_var = paste(c(curr_var, "_b"),collapse='')
df1[,new_var] = recode(df1[,curr_var],"NA=0;1=0;2=1;3=1;4=1;5=1")
# making assumption that NA means they didn't experience the emotion
}
Similarly, for data that are on a continuous scale, such as 0-100, the researcher will need to decide what if anything needs to be done with the scale. For research questions examining the diversity in the intensity of emotion experiences, maintaining this 0-100 scale is approporiate. For research questions examining the diversity in emotion experiences, some type of re-coding will need to be done so that certain values in the range correspond to “absence/0” and the other values correspond to “presence/1”.
Note on missing data: if there are completely missing rows of relevant data for a participant on a given occasion, these should be removed. If missing values occur for some items, but not others for a given occasion, possibilities include either removing the entire row of data or recoding missing values to 0, which makes the assumption that data are missing for this item because it wasn’t experienced on that particular occasion.
Third, aggregate the multi-occasion data into counts over time for each item.
For this example, we utilize the binary scale items, but the process is the same when using the continuous scale variables. There are many ways to aggregate data, and for this example we chose to use the ddply
function within the plyr
package.
#install.packages("plyr")
library(plyr)
df2 <- ddply(df1, "id", summarize,
enthusiastic = sum(enthusiastic_b, na.rm = TRUE),
amused = sum(amused_b, na.rm = TRUE),
proud = sum(proud_b, na.rm = TRUE),
interested = sum(interested_b, na.rm = TRUE),
calm = sum(calm_b, na.rm = TRUE),
sad = sum(sad_b, na.rm = TRUE),
ashamed = sum(ashamed_b, na.rm = TRUE),
guilty = sum(guilty_b, na.rm = TRUE),
hostile = sum(hostile_b, na.rm = TRUE),
nervous = sum(nervous_b, na.rm = TRUE))
Fourth, select a diversity metric to use.
There are many available diversity metrics, most of which are similar to some extent, with nuance as to which may be more or less useful in particular contexts.
Two options to aid in this choice include calculating a series of indices and looking at both the correlations among indices as well as comparing their distributional features.
Four potential diversity metrics include the Gini coefficient, richness, Shannon’s entropy, and Simpson’s index.
For this example, we calculate and compare all four of diversity metrics:
The Gini
function within the ineq
package is used to calculate the Gini coefficient.
#install.packages("ineq")
library(ineq)
The diversity
function within the vegan
package is used to calculate Shannon’s entropy and Simpson’s index.
#install.packages("vegan")
library(vegan)
\(GiniDiversity_{i} = G_{i} = 1 - \left ( \left( \frac{2\sum_{j=1}^m jc_{ij}}{m\sum_{j=1}^mc_{ij}} \right) - \frac{m+1}{m} \right)\)
where \(c_{ij}\) is the count of individual i’s experiences within j = 1 to m categories (e.g. species, emotion types) indexed in non-decreasing order \((c_{ij} < c_{ij+1})\). Diversity scores range from 0 to 1, with higher values indicating more “equitable” or diverse (emotional) ecosystems.
Calculate positive and negative emodiversity:
vars <- 3 # 1 column for id, 1 column for positive emodiversity, and 1 column for negative emodiversity
giniDat <- matrix(data=NA, nrow=length(df2$id), ncol=vars)
idlist <- df2$id
posEmotions <- c("enthusiastic","amused","proud","interested","calm")
negEmotions <- c("sad","ashamed","guilty","hostile","nervous")
sets <- list(posEmotions=posEmotions, negEmotions=negEmotions)
for(i in 1:length(idlist)){
for(x in 1:length(sets)){
curr_set = unlist(sets[x])
data <- df2[i,curr_set]
giniDat[i,1] <- idlist[i]
giniDat[i,x+1] <- 1-(Gini(data,na.rm=TRUE))
}
}
giniDat <- data.frame(giniDat)
names(giniDat) <- c("id","pos_emodiv_g","neg_emodiv_g")
df3 <- merge(df2,giniDat, by="id")
\(RichnessDiversity_{i} = R_{i}\)
where \(R_{i}\) is the number of j = 1 to m categories.
Calculate positive and negative emodiversity:
df4 <- matrix(data=NA,nrow=150,ncol=3)
idlist <- df3$id
sets <- list(posEmotions=posEmotions, negEmotions=negEmotions)
for(i in 1:length(idlist)){
for(x in 1:length(sets)){
curr_set = unlist(sets[x])
data <- df3[i,curr_set]
df4[i,1] <- idlist[i]
df4[i,x+1] <- length(which(data >=1))
}
}
# Set variable names
colnames(df4) <- c("id","pos_emodiv_r","neg_emodiv_r")
# Convert to data frame object
df4 <- data.frame(df4)
# Combine with other data
df5 <- merge(df3,df4,by=c("id"), all=TRUE)
\(ShannonDiversity_{i} = H_{i} = - \sum_{j=1}^m p_{ij}lnp_{ij}\)
where m is the number of discrete emotion types (i.e. angry, sad, enthusiastic), and \(p_{ij}\) is the proportion of individual i’s experiences that were of each discrete emotion type, j = 1 to m. Scores can range from 0 to ln(m), with higher scores indicating greater diversity (richness and evenness).
Calculate positive emodiversity:
df5$pos_emodiv_h <- diversity(df5[posEmotions], index = "shannon")
Calculate negative emodiversity:
df5$neg_emodiv_h <- diversity(df5[negEmotions], index = "shannon")
Simpson’s index has two parameterizations that, although scaled differently, may be interpreted as the probability that any two randomly selected experiences are of different emotion types. The inverse version is calculated as,
\(SimpsonDiversity_{i} = D_{i} = 1 - \sum_{j=1}^m p_{ij}^2\)
and ranges from 0 to 1. Higher scores indicate greater diversity (richness and evenness) of emotional experiences.
Calculate positive emodiversity:
df5$pos_emodiv_d <- diversity(df5[posEmotions], index = "simpson")
Calculate negative emodiversity:
df5$neg_emodiv_d <- diversity(df5[negEmotions], index = "simpson")
Calculate the descriptives for the positive and negative emodiversity indices:
#install.packages("psych")
library(psych)
describe(df5[,-1])
## vars n mean sd median trimmed mad min max range
## enthusiastic 1 150 20.19 2.54 20.00 20.23 2.97 12.00 26.00 14.00
## amused 2 150 20.72 2.44 21.00 20.74 2.97 15.00 26.00 11.00
## proud 3 150 0.12 0.35 0.00 0.02 0.00 0.00 2.00 2.00
## interested 4 150 20.63 2.46 21.00 20.59 2.97 15.00 27.00 12.00
## calm 5 150 21.00 2.60 21.00 21.03 2.97 13.00 27.00 14.00
## sad 6 150 20.67 2.26 20.50 20.68 2.22 14.00 27.00 13.00
## ashamed 7 150 20.31 2.39 20.00 20.34 2.97 14.00 26.00 12.00
## guilty 8 150 20.81 2.37 21.00 20.93 2.97 13.00 25.00 12.00
## hostile 9 150 20.63 2.67 21.00 20.69 2.97 13.00 27.00 14.00
## nervous 10 150 0.19 0.41 0.00 0.10 0.00 0.00 2.00 2.00
## pos_emodiv_g 11 150 0.76 0.02 0.76 0.76 0.02 0.71 0.80 0.09
## neg_emodiv_g 12 150 0.76 0.02 0.76 0.76 0.02 0.70 0.80 0.10
## pos_emodiv_r 13 150 4.11 0.32 4.00 4.02 0.00 4.00 5.00 1.00
## neg_emodiv_r 14 150 4.18 0.39 4.00 4.10 0.00 4.00 5.00 1.00
## pos_emodiv_h 15 150 1.39 0.02 1.38 1.38 0.00 1.36 1.46 0.10
## neg_emodiv_h 16 150 1.39 0.02 1.38 1.39 0.00 1.36 1.44 0.08
## pos_emodiv_d 17 150 0.75 0.00 0.75 0.75 0.00 0.74 0.76 0.02
## neg_emodiv_d 18 150 0.75 0.00 0.75 0.75 0.00 0.74 0.76 0.02
## skew kurtosis se
## enthusiastic -0.21 0.13 0.21
## amused -0.07 -0.42 0.20
## proud 2.79 7.28 0.03
## interested 0.14 -0.31 0.20
## calm -0.18 0.05 0.21
## sad -0.04 0.37 0.18
## ashamed -0.21 -0.02 0.19
## guilty -0.55 0.05 0.19
## hostile -0.17 -0.06 0.22
## nervous 1.88 2.39 0.03
## pos_emodiv_g -0.26 -0.37 0.00
## neg_emodiv_g -0.50 0.02 0.00
## pos_emodiv_r 2.42 3.86 0.03
## neg_emodiv_r 1.65 0.72 0.03
## pos_emodiv_h 2.24 4.34 0.00
## neg_emodiv_h 1.46 0.56 0.00
## pos_emodiv_d 0.20 1.91 0.00
## neg_emodiv_d -0.01 0.38 0.00
Calculate correlations among the positive and negative emodiversity indices to look at similarities/differences depending on which metric (e.g. Gini coefficient vs. Shannon’s entropy) is used:
round(cor(df5[,c("pos_emodiv_g","pos_emodiv_r","pos_emodiv_h","pos_emodiv_d",
"neg_emodiv_g","neg_emodiv_r","neg_emodiv_h","neg_emodiv_d")]),2)
## pos_emodiv_g pos_emodiv_r pos_emodiv_h pos_emodiv_d
## pos_emodiv_g 1.00 0.23 0.47 0.85
## pos_emodiv_r 0.23 1.00 0.95 0.66
## pos_emodiv_h 0.47 0.95 1.00 0.84
## pos_emodiv_d 0.85 0.66 0.84 1.00
## neg_emodiv_g -0.06 0.10 0.07 0.01
## neg_emodiv_r 0.11 -0.06 -0.02 0.06
## neg_emodiv_h 0.09 -0.03 0.00 0.06
## neg_emodiv_d 0.01 0.05 0.05 0.04
## neg_emodiv_g neg_emodiv_r neg_emodiv_h neg_emodiv_d
## pos_emodiv_g -0.06 0.11 0.09 0.01
## pos_emodiv_r 0.10 -0.06 -0.03 0.05
## pos_emodiv_h 0.07 -0.02 0.00 0.05
## pos_emodiv_d 0.01 0.06 0.06 0.04
## neg_emodiv_g 1.00 0.35 0.53 0.86
## neg_emodiv_r 0.35 1.00 0.97 0.75
## neg_emodiv_h 0.53 0.97 1.00 0.87
## neg_emodiv_d 0.86 0.75 0.87 1.00
Positive Emodiversity (Gini coefficient, richness, Shannon’s entropy, Simpson’s index):
ggplot(df5,aes(x=pos_emodiv_g)) +
geom_histogram(colour="black",fill="white") +
xlab("Gini Coefficient (Positive)") + ylab("Count")
ggplot(df5,aes(x=pos_emodiv_r)) +
geom_bar(colour="black",fill="white") +
scale_x_discrete(limits=c(4,5)) +
xlab("Richness (Positive)") + ylab("Count")
ggplot(df5,aes(x=pos_emodiv_h)) +
geom_histogram(colour="black",fill="white") +
xlab("Shannon's Entropy (Positive)") + ylab("Count")
ggplot(df5,aes(x=pos_emodiv_d)) +
geom_histogram(colour="black",fill="white") +
xlab("Simpson's Index (Positive)") + ylab("Count")
Negative Emodiversity (Gini coefficient, richness, Shannon’s entropy, Simpson’s index):
ggplot(df5,aes(x=neg_emodiv_g)) +
geom_histogram(colour="black",fill="white") +
xlab("Gini Coefficient (Negative)") + ylab("Count")
ggplot(df5,aes(x=neg_emodiv_r)) +
geom_bar(colour="black",fill="white") +
scale_x_discrete(limits=c(4,5)) +
xlab("Simpson's Reciprocal (Negative)") + ylab("Count")
ggplot(df5,aes(x=neg_emodiv_h)) +
geom_histogram(colour="black",fill="white") +
xlab("Shannon's Entropy (Negative)") + ylab("Count")
ggplot(df5,aes(x=neg_emodiv_d)) +
geom_histogram(colour="black",fill="white") +
xlab("Simpson's Index (Negative)") + ylab("Count")
Set-up code:
vars_to_change <- c("enthusiastic_c","amused_c","proud_c","interested_c","calm_c",
"sad_c","ashamed_c","guilty_c","hostile_c","nervous_c")
for(x in 1:length(vars_to_change)){
curr_var = vars_to_change[x]
new_var = paste(c(curr_var, "_cat"),collapse='')
df1[,new_var] = recode(df1[,curr_var],"0=NA;1:4=x")
}
# Sets
emotions_c <- c("id","enthusiastic_c","amused_c","proud_c","interested_c","calm_c",
"sad_c","ashamed_c","guilty_c","hostile_c","nervous_c")
emotions_cat <- c("id", "enthusiastic_c_cat", "amused_c_cat", "proud_c_cat",
"interested_c_cat", "calm_c_cat", "sad_c_cat", "ashamed_c_cat",
"guilty_c_cat", "hostile_c_cat", "nervous_c_cat")
# Melt datasets
emotions_c_melt <- melt(df1[emotions_c],id=c("id"))
emotions_cat_melt <- melt(df1[emotions_cat],id=c("id"))
# Rename variables
names(emotions_c_melt) <- c("id","EmotionLabel","EmotionValue")
names(emotions_cat_melt) <- c("id","CategoryLabel","CategoryValue")
# Combine
emotions <- cbind(emotions_c_melt,emotions_cat_melt[2:3])
# Remove missing data
emotions2 <- emotions[complete.cases(emotions),]
# Add a zero row for each emotion and and each participant (i.e. 10 new rows per participant) so that all spokes are printed
idlist <- unique(emotions$id); idlist2 <- rep.int(idlist,10); id <- sort(idlist2)
catLab <- c("enthusiastic_cat","amused_cat","proud_cat","interested_cat","calm_cat",
"sad_cat","ashamed_cat","guilty_cat","hostile_cat","nervous_cat")
CategoryLabel <- rep(catLab,length(idlist))
catVal <- c(1:10); CategoryValue <- rep(catVal,length(idlist))
emoLab <- c("enthusiastic_c","amused_c","proud_c","interested_c","calm_c",
"sad_c","ashamed_c","guilty_c","hostile_c","nervous_c")
EmotionLabel <- rep(emoLab,length(idlist))
EmotionValue <- rep(0,1500) # number of id's * number of emotion types (150*10 = 1500)
newdat <- cbind(id,EmotionLabel,EmotionValue,CategoryLabel,CategoryValue)
emotions3 <- rbind(emotions2,newdat)
# Factor variables
emotions3$CategoryValue <- factor(emotions3$CategoryValue,levels = c(1:10),
labels = c("Enthusiastic", "Amused", "Proud", "Interested", "Calm",
"Sad", "Ashamed", "Guilty", "Hostile", "Nervous"))
# Color palette for the radar plots
palette <- c("#1D0047","#0A0060","#540041","#A70046","#C40060")
Example radar plot code for one participant:
ggplot(emotions3[which(emotions3$id==1),], aes(x = CategoryValue, fill = factor(EmotionValue))) +
geom_bar(width = .8) +
coord_polar() +
#ylim(0,14) +
scale_color_manual(values=palette) +
scale_fill_manual(values=palette) +
theme(axis.title=element_blank(),
axis.text.x=element_text(colour="black",size=10),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
legend.position="none",
panel.grid.major=element_line(colour="grey89"),
panel.grid.minor=element_line(colour="grey89"),
panel.background=element_rect(fill="white"))
In cases where the total number of emotion experiences across all occasions was zero, the individual’s diversity score was treated as missing.
In their analysis of various diversity metrics, Benson et al (under review) showed that of the 8 distributions, 7 were highly skewed and only the Gini coefficient for negative emodiversity resembled a relatively normal distribution. This is especially important to keep in mind when using emodiversity as an outcome variable in a regression-based analysis, where the assumption of normally distributed scores is present.
This tutorial provides instruction on how to calculate diversity, with emodiversity used as the focal example. Code is presented for four diversity metrics. Please reference Benson et al (under review) for further information on how to select a diversity metric based on the theoretical conception of diversity (evenness, richness, or both), practicalities of the study design, and the scaling and distributional properties of diversity scores.
Overall, emodiversity (and other diversity-based IIV constructs) is a conceptually meaningful indicator of individuals’ emotion experiences as they relate to physical health. We hope future research continues to expand the study of diversity-type IIV constructs into other domains where discrete types of feelings and behaviors manifest (e.g., types of stressors, types of activities, types of locations, types of social partners), to better understand how the range of experiences an individual encounters contributes to health, well-being and successful aging.