Overview

To try this tutorial out yoruself, you will need:

The goals of this tutorial are to demonstrate how to access your Qualtrics and Box data directly in R through connecting to their APIs (application programming interface). Through this interface, you can even push files back to Box (such as a cleaned data file). The skills you will learn in this tutorial could save a lot of time and hard drive space for you in the future :)

Cran documentation:

Vignettes

Much of what is included in the tutorial today was generated from the package vignettes, so please also check these out for additional information.

Other helpful resources:


Outline

Disclaimer: for secruity purposes, I can’t actually show any of my Qualtrics or Box information/ data in the knitted html file (I will show some non-sensitive data in the live demo), but in some cases I have created “fake alternatives” so that you can still get a sense for what the various information input into and generated from the following functions should look like.


Section 0: Load Packages & Set Working Directory

The following code will check whether all packages needed for this tutorial are installed, and will automatically install them if they are not already installed:

list.of.packages <- c("qualtRics", "boxr", "rstudioapi")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

Load libraries:

library(qualtRics)
library(boxr)
library(rstudioapi)

Set working directory to source file location (or you can specify a different working directory):

current_path <- getSourceEditorContext()$path
setwd(dirname(current_path))

Section 1: Register your Credentials

The following sections (1.1 and 1.2) demo how to register your credentials. As mentioned previously, no real data can be shown due to privacy and security reasons. However, to make this as user friendly, I have included in the functions some of the key arguements and what type of information should be specified for each argument. When you modify the argument portions to the right of the = sign for your own work, remove any values in between and including the <> Also, the functions in these packages have more arguments and options than can be covered in a single tutorial, so I urge you in your own time to dig further into the functions by visiting their help files.

To visit the help page for any given function, type a ‘?’ and then type the function like so:

?box_auth()

Section 1.1: Register your Qualtrics Credentials

In order to connect to the Qualtrics API, you will need to register your credentials, which assumes that you already have a Qualtrics account set-up. There are two ways to do this:

  1. Register your credentials at the start of each R session
  2. Store a configuration file with your credentials in your working directory

Option 1: registering your credentials at the start of your R session. To do this, you will need to provide your API key and your institution-specific root url.

registerOptions(api_token = <"Your-API-Token">, root_url = <"Your-Root-URL">)

Here is an example of what credentials might look like (but note these aren’t actually real/valid credentials):

registerOptions(api_token = "LjqAfzx3NODLSFYNYHmmpz3g50DLBgwdk1x6OtO9", root_url = "pennstate.qualtrics.com")

Option 2: you can also create a qualtrics configuration file. For more information on how to set this up, visit the package vignette section on ‘Using a configuration file’. This allows you to set global options rather than specifying them as arguments when you retrieve data frames.


Section 1.2: Register your Box Credentials

To register your box credentials, you will need your client ID and your client secret. The directions for how to find this information can be found in the boxr package vignette.

In short:

1.) Create an app

  • Visit the box developers website
  • Click ‘My Apps’
  • Log-in
  • Create a new ‘app’ for your box.com account (name it whatever you want) - which will allow you to access your account via the API

2.) Set OAuth2 Parameters

  • Click the ‘Content API Access Only’ option
  • For your ‘redirect_uri’ enter: http://localhost
  • Note that this page also gives you your ‘client_id’ and ‘client_secret’, which you will (in the next step) copy into your R script.

3.) Connect boxr to your account

  • Enter your ‘client_id’ and ‘client_secret’ information into the corresponding arguement in the box_auth() given below:
box_auth(client_id = "Your-Client-ID", client_secret = "Your-Client-Secret")

Here is an example of what credentials might look like (but note these aren’t actually real/valid credentials):

box_auth(client_id = "w3l4pty29lxwhw68pow4072pslndwv85", 
         client_secret = "l92RTYPLsH9b5pso4YLtP2PbInme4ayC")

If your credentials are accepted, a browser window will open for you to formally give yourself access to your files on box.com. From what I can tell, you only need to grant yourself this access in a browser window once.

After that, when you open a new session in R, you will run the box_auth(). As long as your credentials are still valid, you should see a message printed in your console that reads something like:

boxr: Authenticated at box.com as YOUR NAME (YOUR EMAIL ADDRESS)

You will not need to grant permission through the web browser interface again (or at least until you do the next major version update to R).


Section 2: Core Functions

The following sections (2.1 and 2.2) demo core functions in the qualtRics and boxr packages. As mentioned previously, no real data can be shown due to privacy and security reasons. However, to make this as user friendly, I have included in the functions some of the key arguements and what type of information should be specified for each argument. When you modify the argument portions to the right of the = sign for your own work, remove any values in between and including the <> Also, the functions in these packages have more arguments and options than can be covered in a single tutorial, so I urge you in your own time to dig further into the functions by visiting their help files.

To visit the help page for any given function, type a ‘?’ and then type the function like so:

?getSurveys()

Section 2.1: Functions in qualtRics Package

The qualtRics package has a few core functions inclduing:

  • getSurveys() * Generates a dataframe containing information about all the surveys stored on your Qualtrics account.
  • getSurvey() * Not to be confused with the getSurveys(). * While the former function is used to get a look at information pertaining to all of your surveys, the getSurvey() is used to actually read in the data from a survey directly into R. * This eliminates the need to download the survey data onto your own computer (e.g., as a .csv file) before reading it into R.
  • readSurvey() * Reads csv files generated by Qualtrics

It also has a few handy helper functions including:

  • registerOptions() * Stores your API key and root url as variables in your working environment (shown above). * It has other useful capabilities such as an argument to specify whether to export survey responses as choice text (useLabels = TRUE is the default) or numeric values (useLabels = FALSE).
  • getSurveyQuestions() * Retrieves a data frame containing questions and question IDs for a survey
  • qualtRicsConfigFile() * Provides information on how to store information on your Qualtrics credentials

In section 1.2, we touched on the registerOptions(). Below includes further information on the capabilities of the other key functions in the qualtRics package.

Core Functions:

getSurveys()

Example of how to get a list of surveys:

survey_info <- getSurveys()
survey_info
##                   id     name            ownerId         lastModified
## 1 SV_199H88JqO5vqR6h survey 1 UR_n11x68r37BRo6YM 2017-09-18T06:05:04z
## 2 SV_4992e3xLnA9AwS7 survey 2 UR_51F8jdzEn1G781U 2017-09-19T23:22:21z
## 3 SV_X8u6c53y42F5IDm survey 3 UR_NrHy3797YP628le 2017-09-20T07:06:05z
##   isActive
## 1    FALSE
## 2     TRUE
## 3     TRUE

Within this dataframe are the following variables

  • id = survey id
  • name = the name you gave to each survey
  • ownerId = the Qualtrics generated ID for the person who created/ owns the survey
  • lastModified = the date and time of when the survey was last modified
  • isActive = logical TRUE/FALSE indicating whether the survey has active status or not

The id column is key here because you will need that information in order to pull specific survey data from your Qualtrics account (demonstrated next).

getSurvey()

Example of how to pull the first survey from the survey_info object (created above). I indexed the record corresponding to the first row and first column of the dataframe to get the first suvey ID, but you could also just use the character string itself (e.g., SV_199H88JqO5vqR6h) in the surveyID argument (this can also be a numeric indicating the argument is flexible to accomodate the id as either a character string or numeric value). Additionally I specified to import variables formatted as numeric rather than choice text (uselabels = FALSE), and specified that I only wanted to pull certain question ID’s (i.e., Q1, Q2, Q3, Q21):

survey_df <- getSurvey(surveyID = survey_info[1,1],  
                       useLocalTime = TRUE, 
                       force_request = TRUE, 
                       uselabels = FALSE,
                       includedQuestionIds = c("QID1", "QID2", "QID3", "QID21")) 
                            # generate question IDs from getSurveyQuestions()
                            # Only applies to user generated questions
                            # Will still give qualtrics generated meta data

There are a few important arguments in this function to take note of (with further arguments that can be specified that are included in the registerOptions(). You can find out more about these by going to the registerOptions() help file.

  • surveyID = is where you enter the ID (as a character string) for the specific survey you would like to read into R from qualtrics
  • lastResponseId = allows you to set the option of only exporting responses after a certain response id
  • startDate = filter results to only those received after a certain date (accepts dates as character strings in YYYY-MM-DD format)
  • endDate = filter results to only those received before a certain date
  • useLocalTime = if TRUE (defaults to FALSE), will use local time to determine response date values
  • force_request = if TRUE, the survey will always be loaded from the API rather than the temporary directory where it is stored on your computer.
  • uselabels = This is super handy for specifying whether you would like the data to be formatted as choice text (= TRUE) or numeric values (= FALSE).
  • convertStandardColumns = if TRUE (default), will convert general data columns (first name, last name, lat, lon, ip address, startdate, enddate, etc) to their proper format.
  • includedQuestionIds = is for specifying whether you only want to pull certain questions.
    • If you use this argument, I first recommend using the getSurveyQuestions(), illustrated below, to get the question IDs

readRDS() and readSurvey()

  • When you call a survey from qualtrics, it is stored as an RDS file
  • You can either have this file stored in a temporary directory that will be cleared out when you end your R session or you can specify a directory where you would like to permanently store this file.
  • If you permanently store the .RDS file, you can use the readRDS() where you just need to specify the file path as a character string
rds_survey <- readRDS(file = <"/users/your-name/desktop/survey-name.rds">)
  • Alternatively, if you have already stored the data as a .csv file, you can use the readSurvey() to read in the data.
  • Similarly, all you need to specify is the file path
csv_survey <- readSurvey(file = <"/users/your-name/desktop/survey-name.csv">)

Helper Functions:

registerOptions()

Set global option to export survey responses as choice text:

registerOptions(useLabels = TRUE)

Set global option to export survey responses as numeric values:

registerOptions(useLabels = FALSE)

getSurveyQuestions()

Retrieves a data frame containing questions and question IDs for a survey

survey_questions <- getSurveyQuestions(surveyID = survey_info[1,1])
survey_questions
##    qid                              question
## 1 QID1 What did you have for breakfast today
## 2 QID2             Did you have coffee today
## 3 QID3     What did you have for lunch today
## 4 QID4    What did you have for dinner today
## 5 QID5         Did you have a midnight snack

qualtRicsConfigFile()

  • Provides information on how to store information on your Qualtrics credentials. For more info, see the package vignette

Section 2.2 Functions in boxr Package

Key functions and their capabilities in the boxr package include:

  • box_search() * search the files in a box.com account
  • box_auth() * authenticate your box.com account
  • box_setwd() * get/set default box.com directory/folder
  • box_ls () * obtain a dataframe describing the contents of a box.com folder
  • box_dl() * download and upload individual files from box.com
  • box_read() * read files from box.com into memory as R objects
  • box_write() * write R objects to files remotely hosted on box.com
  • box_dir_create() * create a new box.com folder
  • box_delete() * delete folders or files
  • box_restore() * restore folders or files

box_search()

One way to find the box.com generated file or folder id is to use the box_search(). You can use the function to search generally for both files and folders, or you can use the ‘file =’ argument to specify that you would only like to search for files or only would like to search for folders.

# Print up to 10 results in console
  box_search(query = "tutorials")
# Extract full search results into a dataframe
  box_search_df <- as.data.frame(box_search(query = "tutorials")) 

Gives quite a bit of useful information such as:

  • name: name of folder or file
  • type: specifies whether folder or file
  • id: the box.com id of the folder or file
    • This can be used/ extracted to actually pull data from box.com
  • size: file size
  • description: description
  • owner: the email address of the user who owns the folder or file
  • path: the file path on box.com pointing to the particular folder or file
  • modified_at and content_modified_at: it’s unclear to me from the documentation how these two columns differ…but in general they refer to modifying files or folders.
  • version: denotes which version of the folder or file.

box_setwd()

You can also set a box.com working directory if you primarily want to work within a certain box folder. The only argument this function takes is ‘dir_id’. It appears that either a numeric value or character string will work. You can also pull this information from the box_search_df object we generated in the last section:

box_setwd(dir_id = <ADD-YOUR-DIRECTORY-HERE>) 
# For example (note this is not a real folder id): 
    box_setwd(dir_id = 29049269215) 
# Or you can index from the box_search_df created earlier
    box_setwd(dir_id = box_search_df$id[1])

box_ls()

To access the contents of a folder:

# If you know the folder ID number
  box_folder_df <- as.data.frame(box_ls(dir_id = <INSERT-BOX.COM-ID-FOR-FOLDER-OF-INTEREST>))

# Or you can pull this information from the box_search object created above
  box_folder_df <- as.data.frame(box_ls(dir_id = box_search_df$id[1]))

box_dl() and box_ul()

These functions can be used to download individual files from box (into your working directory by default) and upload individual files to box.

Here is an example of downloading a file from box to working directory:

box_file_df <- box_dl(file_id = <INSERT-YOUR-FILE-ID-HERE>)
# Or, you can index this info from one of the box_folder_df's created above
box_file_df <- box_dl(file_id = box_folder_df2$id[3])

And here is an example of uploading a file from working directory to box:

box_ul(dir_id = <INSERT-YOUR-DIRECTORY-ID-HERE>, file = <INSERT-YOUR-FILE-PATH-HERE>)

box_read() and box_write()

These functions are similar to the previous functions we tried out (box_dl and box_ul()) except that they read files from box.com into memory and write files to fox.com from memory rather than downloading a file into a specified directory or uploading a file from a specified directory. When you close your R session, the data will also be wiped from memory.

Example of how to read a file from box.com into memory:

box_file_df2 <- box_read(file_id = <INSERT-FILE-ID-NUMBER>)

Example of how to write data from memory to box.com:

box_write(x = <INSERT-R-OBJECT>, 
          filename = <INSERT-NAME-OF-FILE-TO-BE-UPLOADED>,
          dir_id = <INSERT-BOX.COM-FOLDER-ID-TO-UPLOAD-FILE-TO>)

Questions, Suggestions, Sharing, and Citations

If you have questions or suggestions on material to add to this tutorial, please contact Libby at leb237@psu.edu. We are happy for this tutorial to be shared, but rather than sharing directly, please make sure to link to it’s page on the QuantDev website: https://quantdev.ssri.psu.edu/tutorials/using-qualtrics-and-boxr-r-packages

There are also many other cool tutorials given by the QuantDev group at: https://quantdev.ssri.psu.edu/tutorials/

Package citations:

citation("qualtRics")
## 
## To cite package 'qualtRics' in publications use:
## 
##   Jasper Ginn (2017). qualtRics: Download Qualtrics Survey Data
##   Directly into R. R package version 2.0.
##   https://CRAN.R-project.org/package=qualtRics
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {qualtRics: Download Qualtrics Survey Data Directly into R},
##     author = {Jasper Ginn},
##     year = {2017},
##     note = {R package version 2.0},
##     url = {https://CRAN.R-project.org/package=qualtRics},
##   }
## 
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
citation("boxr")
## 
## To cite package 'boxr' in publications use:
## 
##   Brendan Rocks (2017). boxr: Interface for the 'Box.com API'. R
##   package version 0.3.4. https://CRAN.R-project.org/package=boxr
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {boxr: Interface for the 'Box.com API'},
##     author = {Brendan Rocks},
##     year = {2017},
##     note = {R package version 0.3.4},
##     url = {https://CRAN.R-project.org/package=boxr},
##   }
citation("rstudioapi")
## 
## To cite package 'rstudioapi' in publications use:
## 
##   Hadley Wickham and JJ Allaire (2016). rstudioapi: Safely Access
##   the RStudio API. R package version 0.6.
##   https://CRAN.R-project.org/package=rstudioapi
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {rstudioapi: Safely Access the RStudio API},
##     author = {Hadley Wickham and JJ Allaire},
##     year = {2016},
##     note = {R package version 0.6},
##     url = {https://CRAN.R-project.org/package=rstudioapi},
##   }
citation("base")
## 
## To cite R in publications use:
## 
##   R Core Team (2017). R: A language and environment for
##   statistical computing. R Foundation for Statistical Computing,
##   Vienna, Austria. URL https://www.R-project.org/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {R: A Language and Environment for Statistical Computing},
##     author = {{R Core Team}},
##     organization = {R Foundation for Statistical Computing},
##     address = {Vienna, Austria},
##     year = {2017},
##     url = {https://www.R-project.org/},
##   }
## 
## We have invested a lot of time and effort in creating R, please
## cite it when using it for data analysis. See also
## 'citation("pkgname")' for citing R packages.