Overview

Dynamical system methods have been instrumental in explaining mechanisms of change and predicting future trajectories and states of psychological systems (Boker & Graham, 1998; Chow, Ferrer, & Hsieh, 2010). A natural next step is to use these methods to design control strategies and guide system trajectories (Molenaar & Nesselroade, 2015; Wang et al., 2014).

In this tutorial, we introduce and forward a Boolean network method because it can give an intuitive explanation of psychological dynamics that are useful for theorists and suggestions for network control that are useful for practitioners. We demonstrate the utility of the Boolean network through analysis of multivariate binary time-series data simulated to mimic intensive longitudinal data (e.g., ecological momentary assessment).

Boolean network

Boolean network (Kauffman, 1969, 1993) method is a specific instantiation of discrete dynamical system model. The special characteristic of the Boolean network method is to use the Boolean operators, including AND (&), OR (|), or NOT (!), to describe and model temporal dynamics between variables.

The Boolean network method is like a vector-autoregression (VAR) model, whereas the temporal relations are expressed using the Boolean operators, and the variables are binary.

Data type that can use the Boolean network method: The Boolean netowrk method can be applied on intensive longitudinal data (e.g., ecological Momentary Assessment, physiological data, physical activity, behavior codes) to model the temporal dynamics and facilitate control design. The method can work on binary time-series, and continuous-scale time-series, which need to be binarized (see the “Tutorial of Boolean Newtork Analysis of Time-Series Data - Part 2 continuous-scale data” on Quant Dev website).

We are using the R package BoolNet (Mussel et al., 2010; Lähdesmäki et al., 2003) to conduct the data analysis in this tutorial.

This tutorial is to demonstrate the following 6 steps of Boolean network approach, including:

  1. Simulation of binary data

  2. Inference of Boolean functions

  3. Plotting state space transition graph

  4. Extraction of attractors

  5. Network control (under development)

Step 1: Simulation of binary time-series data

Why simulate?

This step is to generate a binary-scale multivariate time-series which allow us to look at how the model works without empirical data. It has the advantage that we know what is the underlying temporal dynamics with the simulated data, and examine how the method recovered the temporal dynamics in estimates.

How is the data simulated?

Data are simulated for a 3-node network. Time series was simulated based on the temporal relations and process noise. Our hypohetical 3-node newtork was using simulated data based on a pre-defined Boolean functions and process noise. The Boolean functions are expressed in the following equations:

x(t+1) = x(t) & y(t)

y(t+1) = y(t)

z(t+1) = z(t)

The process noise is added to the equations by flipping the binary variable at each time step with a probability of 0.2. Technical details: This is done by randomly generating numbers uniformly distributed between 0 and 1, and if the random number is less than 0.2, then flip the binary variable, which guarantees the probabilty of having noise disrupting the nodes is 0.2.

The three-node network is a hypothetical example used for the purpose of illustration. The Boolean functions and noise probability were arbitrarily chosen.

We simulate 100 steps to generate the time-series.

n.obs <- 100 # number of observation
p <- 3 # number of variables

# randomly generated number from 0 to 1 (uniformly distributed) for each time-step, so if we have a probability for flipping the node, we can compare this random number with the probablity, e.g., .2, and when the random number is less than .2, we will flip the node. 
zeta <- cbind(runif(n.obs,0, 1),
              runif(n.obs,0, 1),
              runif(n.obs,0, 1)) # 3 time-series of noise for 3 variables
zeta <- t(zeta) # adjust zeta to a wide-format

time.series <- matrix(rep(0, p * n.obs), nrow =p , ncol =  n.obs)
time.series[,1] <- runif(p,0, 1) # initial value of the three variables were randomly chosen

time.series[,1] <-ifelse(time.series[,1] >= .5, TRUE,FALSE) # if the randomly generated number is greater than .5, then we convert the value to TRUE, otherwise to FALSE

# this is a threshold to flip the node 
chance.to.flip <- 0.2


for (col in 2:n.obs)
{
  # simulate each time step based on the previous time step and Boolean functions
  time.series[1,col] <-time.series[1,col-1] & time.series[2,col-1] # x = x AND y
  time.series[2,col] <-time.series[2,col-1] # y = y
  time.series[3,col] <-time.series[3,col-1] # z = z
  
  # add noise, if the random number in zeta is less than .2, then flip the node by applying  the NOT
  time.series[1,col] <- ifelse(zeta[1,col] < chance.to.flip, 
                                               !time.series[1,col],
                                               time.series[1,col] )
  time.series[2,col] <- ifelse(zeta[2,col] < chance.to.flip, 
                                               !time.series[2,col],
                                               time.series[2,col] )
  time.series[3,col] <- ifelse(zeta[3,col] < chance.to.flip, 
                                               !time.series[3,col],
                                               time.series[3,col] )
}

# adjust the time-series to long-format
time.series <- t(time.series)
time.series <- data.frame(time.series)
names(time.series) <- c("x", "y", "z")

Plot the binary time-series

Here we plot the converted time-series, and now it is in the binary format. Any colored area indicates the variable is 1, and white area indicates the variable is 0.

# specify the colors of the time-series
n = p
cols = gg_color_hue(n)

# for plotting purposes, convert the binary time-series back to long-format
bin.data.plot <- time.series
bin.data.plot <- data.frame(bin.data.plot)
bin.data.plot$index <- 1:nrow(bin.data.plot)
    
bin.data.plot.melt <- melt(bin.data.plot,id = "index")

bin.data.plot.melt$value.variable <- ifelse(bin.data.plot.melt$value==0,"0",                                        as.character(bin.data.plot.melt$variable))
    
bin.data.plot.melt$value.variable <- factor(bin.data.plot.melt$value.variable,
                                            levels = c("0","x",
                                                       "y",
                                                       "z"))

ggplot(data = bin.data.plot.melt)+
        geom_rect(aes(xmin = index - .5, xmax = index + .5,
                      ymin = 0, ymax = 1, fill = factor( value.variable)))+
   facet_wrap(~variable, ncol = 1) +
  scale_fill_manual(values = c("#FFFFFF", cols))+
  theme(
    strip.background = element_blank(),
    panel.background = element_blank(),
    legend.title =   element_blank(),
    legend.key = element_blank(),
    legend.position = "none",
    axis.text.y=element_text(color="black",size=12),
    axis.text.x=element_text(color="black",size=12),
    axis.title.y=element_text(color="black",size=12),
    axis.title.x=element_text(color="black",size=12),
    axis.line = element_line(color = 'black'))+
  ylim(0,1)+
  xlab("Time") 

Step 2: Inference of Boolean functions

net.data <- t(time.series)
network.size<- p

booleannet <- reconstructNetwork(net.data,
                          method = "bestfit",
                          maxK = 2,
                          readableFunctions=T,
                          returnPBN = F)
print(booleannet)
## Probabilistic Boolean network with 3 genes
## 
## Involved genes:
## x y z
## 
## Transition functions:
## 
## Alternative transition functions for gene x:
## x = (x & y) (error: 21)
## 
## Alternative transition functions for gene y:
## y = (y) (error: 14)
## 
## Alternative transition functions for gene z:
## z = (z) (error: 23)
# saveNetwork(net, paste(outputpath, id,"-network.txt", sep = "")) # can't save a collection of BN

We can see the Boolean functions we simulated the data from are recovered here by the inference, which are:

x(t+1) = x(t) & y(t)

y(t+1) = y(t)

z(t+1) = z(t)

Note that this is only to showcase how the Boolean network inference works, future work can execute a comprehensive simulation study, which requires robust testing of multiple factors, including a variety of forms of Boolean funtions, various observation length, and magnitude of noise.

How Boolean functions are inferred?

The details of how the Boolean functions are inferred can be found in Akutsu et al. (2000). The essence is to compare the outcome varible’s time-series at time t+1 and the time-series of the input variables at time t, after appyling the Boolean operator AND, OR, or NOT.

What is error rate?

Error rate here is the sum of false negative (predicting the outcome as 0 when it was actually 1) and false positive (predicting the outcome as 1 when it was actually 0). Note the total number of time points is 100 in the simulated time-series, and error rate = 21 indicates out of 100 time points 21 were predicted incorrectly using the “bestfit” algorithm.

How to interpret the Boolean functions?

Take function x = (x & y) as an example:

This indicates x(t+1) is predicted by x(t) AND y(t), which means only when both x(t) and y(t) are 1, x(t+1) could be 1; otherwise, x(t+1) will be 0. AND rule is similar to a multiplicative rule, so whenever there is a 0 in an AND rule, the outcome of the product will be 0.

Another way to talk about the interpretation of this Boolean function is x will be turned ON at time t+1 only if both x and y were turned ON at time t.

What are the possible applications in psychology?

In psychology, when two variables have to work together to activate another variable, it is an AND rule, e.g., when both parent and child are yelling at each other, the child will continue to yell at the parent; if either the parent or child stops yelling at each other, the child will stop yelling at the parent. A Boolean function that looks like yelling_child(t+1) = yelling_child(t) & yelling_parent(t) describes such a dynamic.

Here, we select the fisrt network if more than 1 rule inferred per node, so that the attractors can be extracted.

# note here we only chose the first function of all possible functions
singlenet <- chooseNetwork(booleannet,
            functionIndices = rep(1,network.size),
            dontCareValues=rep(1,network.size),
            readableFunctions=T)
print(singlenet)
## Boolean network with 3 genes
## 
## Involved genes:
## x y z
## 
## Transition functions:
## x = (x & y)
## y = (y)
## z = (z)

Why extract attractors first?

Note here, we extract attractors first (Step 4) because state transition graph needs the attractor object as an input. But we will explain what the attractors mean later, as the state transition graph (Step 3) will help you understand the concept of attractor.

ga <- getAttractors(network = singlenet,
                  returnTable = TRUE)

# print(ga, activeOnly=TRUE)

Step 3: Plot state space transition graph

p<-plotStateGraph(ga,
               piecewise=TRUE,
               drawLabels = T,
               plotIt = F,
               colorsAlpha = c(colorBasinsNodeAlpha    = 1,
                               colorBasinsEdgeAlpha    = 1,
                               colorAttractorNodeAlpha = 1,
                               colorAttractorEdgeAlpha = 1))
plot.igraph(p, 
            label.cex = 1.2, 
            vertex.label.color="black", 
            vertex.label.dist=2, 
            remove.loops = T,
            edge.arrow.size=.4)

What does state transition graph mean?

The state transition graph shows how a state transition to another state, where each node is a state of the system (a vector of state of x, y, z), and the arrows/edges are state-to-state transition.

A state of the system is a vector of the binary states of the variables, e.g., state “100” is x = 1, y = 0, z = 0.

The state “100” will transition to state “000”. The state transition is computed based on the Boolean functions inferred in the first step. e.g., because the x(t+1) = x(t) & y(t), so when x(t) = 1 and y(t) = 0, x(t+1) = 1 & 0 = 0, Similarly, y(t+1) = y(t) = 0, z(t+1) = z(t) = 0, so the next state of the system is (0,0,0), hence state 100 transitions to state “000”.

What is an attractor and how can I tell from the state transition graph?

When a state transitions back to itself, it is an attractor - as the system gravitates toward an attractor. Here in each attractor, there is 1 or multiple attractor state(s) that the system tends to stay for a long run. When there is 1 state that system stays, it is called a fixed-point attractor; when there are multiple states the system rotate, it is called a limit cyle or complex attractor.

The states that all go to the same attractor is called an attractor basin, because once the system goes into any of such states, it will eventually go to the same attractor; just like once a ball is falling into a basin, it will eventually go into the lowest point and stay there.

What are the possible applications in psychology?

In psychology, using the same example of parent-child dynamic, both the parent and child can be stuck in an attractor where both are yelling at each other - mutual hostility. So discovering where the attractors are will lead to understanding of the problems in the system; and this also alludes to the network control part in Step 5 because naturally we would want to intervene when problematic or undesirable attractors are found.

Step 4: Extraction of attractors

print(ga, activeOnly=TRUE)
## Attractor 1 is a simple attractor consisting of 1 state(s) and has a basin of 2 state(s).
## Active genes in the attractor state(s):
## State 1: --
## 
## Attractor 2 is a simple attractor consisting of 1 state(s) and has a basin of 1 state(s).
## Active genes in the attractor state(s):
## State 1: y
## 
## Attractor 3 is a simple attractor consisting of 1 state(s) and has a basin of 1 state(s).
## Active genes in the attractor state(s):
## State 1: x, y
## 
## Attractor 4 is a simple attractor consisting of 1 state(s) and has a basin of 2 state(s).
## Active genes in the attractor state(s):
## State 1: z
## 
## Attractor 5 is a simple attractor consisting of 1 state(s) and has a basin of 1 state(s).
## Active genes in the attractor state(s):
## State 1: y, z
## 
## Attractor 6 is a simple attractor consisting of 1 state(s) and has a basin of 1 state(s).
## Active genes in the attractor state(s):
## State 1: x, y, z

What do we know about the attractors?

We already extracted attractors and explained them previously, here we print out all the attractors and which nodes are activated in each attractor state.

There are 6 attractors found in this system, and the active nodes in the attractor state are shown repsectively. E.g., attractor 6 have all 3 nodes turned ON in the attractor state, or state “111”.

Step 5: Network control (under development)

Why network control is useful?

After extracting the attractors, we can define its desirability based on practical concerns. Using the parent-child dynamic example, we might want to direct the parent and/or child out of a yelling attractor. That is why network control is useful.

How to design network control?

Network control can be designed based on a search algorithm to find the node to flip so that the system can move from the undesirable attractors to the desirable attractor, as an intervention. A hypothetical example would be, if we consider “100” is a desirable attractor, and “101” is an undesirable attractor, we can flip the third node, so that “101” will become “100”, and then go into “100” in 1 step.

What are the possible applications in psychology?

For a parent-child sytem, there could be a node flip to direct the system out of an undesirable state (e.g., mutual hostility) to a desirable state (e.g., both have neutral behavior toward each other). A hypothetical example would be if the parent can turn the yelling OFF the sytem might eventually go to the mutually neutral state, this can serve as an intervention strategy to turn the family system into a much more amicable attractor.

To be continued

This step is currently under development and testing, so come back after a month or so, when I finish the development and testing of the algorithm!

References

Akutsu, T., Miyano, S., Kuhara, S. (2000). Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. Journal of Computational Biology, 7(3), 331-343.

Boker, S., & Graham, J. (1998). A dynamical systems analysis of adolescent substance use. Multivariate Behavioral Research, 33(4), 479-507.

Chow, S., Ferrer, E., & Hsieh, F. (2010). Statistical Methods for Modeling Human Dynamics: An Interdisciplinary Dialogue (Notre Dame Series on Quantitative Methodology, Vol 4). New York, NY: Taylor & Francis.

Kauffman, S. (1969). Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of Theoretical Biology, 22, 437-467.

Kauffman, S. (1993). The Origins of Order: Self-organization and Selection in Evolution. Oxford, UK: Oxford University Press.

Lähdesmäki, H., Shmulevich, I., & Yli-Harja, O. (2003). On learning gene regulatory networks under the Boolean network model. Machine Learning, 52(1), 147-167.

Molenaar, P.C.M., & Nesselroade, J.R. (2015). Systems methods for developmental research. In R.M. Lerner (Ed) Handbook of Child Psychology and Developmental Science (pp 1-31). New York, NY: Wiley.

Müssel, C. Hopfensitz, M., & Kestler, H. (2010). BoolNet – An R package for generation, reconstruction, and analysis of Boolean networks. Bioinformatics, 26(10), 1378-1380.

Wang, Q., Molenaar, P.C.M., Harsh, S., Freeman, K., Xie, J., Gold, C., Rovine, M., & Ulbrecht, J. (2014). Personalized state-space modeling of glucose dynamics for Type I diabetes using continuously monitored glucose, insulin dose, and meal intake: An extended Kalman filleter approach. Journal of Diabetes Science and Technology, 8(2), 331-345.