Preparing to Analyze Your Data in R

If you'd like to follow along with this tutorial but don't have an R development environment set up, consider using RStudio Cloud, a free service from the RStudio team.

The majority of the code and directions in this page are now deprecated and integrated in the LAMP API. See 'Preparing to Analyze Your Data in Python' for up-to-date data analysis steps.

Connect to your LAMP server.

First, install the LAMP client library.

install.packages("devtools")
devtools::install_github('BIDMCDigitalPsychiatry/LAMP-r')

Then connect to the LAMP API. Your Researcher ID comes from the dashboard at dashboard.lamp.digital.

library(LAMP)
LAMP <- LAMP$new('https://api.lamp.digital', 'your_email_here@email.com', 'your_password_here')
researcher_id <- "YOUR_RESEARCHER_ID"

Download data from the LAMP server.

First, download all Participants and retrieve the ID mapping for all Activities in our Study.

library(dplyr)
library(anytime)
library(data.table)

participants <- LAMP$Participant$allByStudy(researcher_id) %>% pull(id)
activity_map <- LAMP$Activity$allByStudy(researcher_id) %>% 
    dplyr::select(id,name) %>%
		rename(activity = id)

Now we run some pre-processing to flatten the nested data structure that we receive from the ActivityEvent API for each Participant, performed in a loop over all Participants in our Study. Because some entries may be null (missing or invalid), we ignore those and flatten the data after converting time stamps from the UNIX Epoch standard to the human-readable YYYY-MM-DD format. There’s a workaround for excessively-nested survey items as well.

The below section of code is now deprecated and integrated into the LAMP API.

data_list <- list()
for (i in 1:length(participants)) {
  tmp <- LAMP$ActivityEvent$allByParticipant(participants[i])
  if (length(tmp) > 0) {
    tmp <- right_join(activity_map %>% select(activity, name), 
                      jsonlite::flatten(tmp) %>% 
                        mutate(timestamp = anytime(as.numeric(timestamp)/1000, tz = "America/New_York")) %>%
                        rename(activity_duration = duration) %>% 
                        mutate(id = participants[i]),
                      by="activity")
    tmp_event_list <- list()
    for (j in 1:nrow(tmp)) {
      if (length(tmp[j,]$temporal_events[[1]]) > 0) {
				tmp_event_list[[j]] <- cbind(tmp[j,], tmp[j,]$temporal_events[[1]], row.names=F) %>% 
																	dplyr::select(-temporal_events)
      }
    }
    data_list[[i]] <- rbindlist(tmp_event_list, fill = T)
  }
}
data_list[sapply(data_list, is.null)] <- NULL

Now, we combine the data frames we’ve pre-processed and ensure they follow the proper data format. Once we reorder the columns to include only variables of interest, we can call the head() function to preview the data frame. To view all available variables in your data frame, use colnames(result_events_mod). Some of the columns we’re selecting tell us about meta information about the Activity (such as whether it’s a survey or game), the individual survey question or game level results, and other optional game data (static_data, which may include NA values). To learn more about what each of these columns contains, represents, and can be used for, please see the help topic.

result_events <- rbindlist(data_list, fill = T) %>% 
	mutate(id = as.character(id)) %>% 
	mutate(activity = as.character(activity)) %>% 
	mutate(name = as.character(name)) 

head(results)

`Code Output`

id  timestamp  name                                                                                                 item value
U1094374134 2019-12-08 PHQ-8       How often did you feel bad about yourself, or that you were a failure or let your family down?     1
U1094374134 2019-12-08 PHQ-8          How often did have you have trouble concentrating on things such as reading or watching tv?     1
U1094374134 2019-12-08 PHQ-8       How often did you find yourself moving so slowly, or so fidgety/restless, that others noticed?     1
U1094374134 2019-12-08 PHQ-8 How often did you have trouble falling or staying asleep, or sleep for more hours than you meant to?     1
U1094374134 2019-12-08 PHQ-8                                          How often did you feel tired or like you had little energy?     1
U1094374134 2019-12-08 PHQ-8                  How often did you find yourself with no appetite, or eating more than you meant to?     1

Optional: Output the data to a CSV.

write.csv(results, "./output-1-27-20.csv", row.names = F)

Appendix: Sample Analyses

Check out all the activities in the study

This includes both default activities and custom activities.

activity_map %>% select(activity, spec, name)

`Code Output`

##                    activity                    spec                                     name
## 1  QWN0aXZpdHk6MDoxMDM6MjM~              lamp.group                    Daily Survey Check-In
## 2  QWN0aXZpdHk6MToxMDM6MzIw             lamp.survey                                    PHQ-8
## 3  QWN0aXZpdHk6MToxMDM6MzQx             lamp.survey                             Instructions
## 4  QWN0aXZpdHk6MToxMDM6MzQy             lamp.survey                                    GAD-7
## 5  QWN0aXZpdHk6MToxMDM6MzQz             lamp.survey                                 PIU-SF-6
## 6  QWN0aXZpdHk6MToxMDM6MzQ0             lamp.survey                                  Warning
## 7  QWN0aXZpdHk6MToxMDM6MzQ1             lamp.survey Qualitative Digital Media Use Assessment
## 8  QWN0aXZpdHk6MToxMDM6MzQ2             lamp.survey           Screen Time Use - iPhones only
## 9  QWN0aXZpdHk6MToxMDM6MzQ3             lamp.survey                iPhone/Android Assessment
## 10 QWN0aXZpdHk6MjoxMDM6MA~~              lamp.nback                                   N-Back
## 11 QWN0aXZpdHk6MzoxMDM6MA~~           lamp.trails_b                                 Trails B
## 12 QWN0aXZpdHk6NDoxMDM6MA~~       lamp.spatial_span                             Spatial Span
## 13 QWN0aXZpdHk6NToxMDM6MA~~      lamp.simple_memory                            Simple Memory
## 14 QWN0aXZpdHk6NjoxMDM6MA~~           lamp.serial7s                                Serial 7s
## 15 QWN0aXZpdHk6NzoxMDM6MA~~      lamp.cats_and_dogs                            Cats and Dogs
## 16 QWN0aXZpdHk6ODoxMDM6MA~~     lamp.3d_figure_copy                           3D Figure Copy
## 17 QWN0aXZpdHk6OToxMDM6MA~~ lamp.visual_association                       Visual Association
## 18 QWN0aXZpdHk6MTA6MTAzOjA~         lamp.digit_span                               Digit Span
## 19 QWN0aXZpdHk6MTE6MTAzOjA~  lamp.cats_and_dogs_new                        Cats and Dogs New
## 20 QWN0aXZpdHk6MTI6MTAzOjA~     lamp.temporal_order                           Temporal Order
## 21 QWN0aXZpdHk6MTM6MTAzOjA~          lamp.nback_new                               N-Back New
## 22 QWN0aXZpdHk6MTQ6MTAzOjA~       lamp.trails_b_new                             Trails B New
## 23 QWN0aXZpdHk6MTU6MTAzOjA~ lamp.trails_b_dot_touch                       Trails B Dot Touch
## 24 QWN0aXZpdHk6MTY6MTAzOjA~           lamp.jewels_a                          Jewels Trails A
## 25 QWN0aXZpdHk6MTc6MTAzOjA~           lamp.jewels_b                          Jewels Trails B
## 26 QWN0aXZpdHk6MTg6MTAzOjA~      lamp.scratch_image                            Scratch Image
## 27 QWN0aXZpdHk6MTk6MTAzOjA~         lamp.spin_wheel                               Spin Wheel

Get number of participants in the study

print(paste("Number of Participants:", length(participants)))

`Code Output`

## [1] "Number of Participants: 29"

Get engagement and plot activity histogram

engagement_data <- inner_join(results %>% dplyr::count(id), 
                              results %>% group_by(id) %>% filter(row_number()==n()) %>% select(id, timestamp), 
                              by="id") %>% 
  rename(activities.completed = n) %>% rename(most.recent.activity = timestamp)

engagement_data

`Code Output`

## # A tibble: 27 x 3
##    id          activities.completed most.recent.activity
##    <chr>                      <int> <dttm>              
##  1 U1005979819                  659 2019-11-16 22:50:57 
##  2 U1094374134                 2085 2019-08-14 12:23:15 
##  3 U1126469507                  503 2019-12-10 21:52:20 
##  4 U1232915366                  226 2020-01-29 00:51:34 
##  5 U1235780769                  767 2019-12-09 20:08:02 
##  6 U1367615199                  813 2019-11-13 16:11:09 
##  7 U1500960001                  742 2019-10-25 21:26:24 
##  8 U1680931766                 1070 2019-12-13 22:41:58 
##  9 U176381486                   585 2019-11-08 18:45:24 
## 10 U2127860149                  170 2019-11-13 16:51:52 
## # … with 17 more rows

Let's view a histogram representation of the data.

hist(engagement_data$activities.completed, breaks=15)

`Code Output`

![../Topics/Preparing to analyze your data in R/Untitled.png](../Topics/Preparing to analyze your data in R/Untitled.png)

Get mean and standard deviation for any survey scale

We’ll include only anxiety survey (GAD-7) results and parse/convert their answer data to numbers using the readr library. Then, we aggregate by timestamp (which is unique to each Activity), and summarize the table.

library(readr)

data <- results %>% 
  filter(name == "GAD-7") %>% 
  mutate(value = as.numeric(parse_number(as.character(value)))) %>%
  group_by(id,timestamp) %>%
  summarise(average_GAD = mean(value))

summary(data$average_GAD)

`Code Output`

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.4286  0.8571  0.9139  1.4286  3.0000

print(paste("Mean GAD-7 across all participants:", round(mean(data$average_GAD), 3), "±", round(sd(data$average_GAD), 3)))

`Code Output`

## [1] "Mean GAD-7 across all participants: 0.914 ± 0.667"

Plot survey scores over time for an individual participant

Filter the first Participant’s mood (PHQ-8) results and parse the strings into numbers as they may be either numeric or text. Then, we aggregate by timestamp, which is unique for each Activity, and take the mean of all scores for a given timestamp.

data <- results %>% 
  filter(id == participants[1] & name == "PHQ-8") %>%
  mutate(value = as.numeric(as.character(value))) %>%
  group_by(timestamp) %>%
  summarise(average_score = mean(value))

Plot the now-filtered data using the ggplot2 library; you’ll find our sample graph below.

library(ggplot2)
ggplot(data, aes(x = timestamp, y = average_score)) +
  geom_line(size=1, color="steelblue") +
  theme_minimal(base_size = 15) +
  ylim(0,4) +
  theme(
    panel.grid.major = element_blank(),
    panel.background = element_blank(),
    axis.line = element_line(colour = "black"),
    axis.title.x = element_text(size = 15))

Compare self-reported anxiety with self-reported problematic internet use

First, we filter data, using a left-join (which is like a merge operation) between the GAD-7 and PIU-6 survey instruments, such that each participant is a single row. All answer data is converted to numeric values first, and then we aggregate by ID, which is unique to each Activity. Finally, we take the mean of all scores for a given timestamp.

library(tidyr)
library(Hmisc)

data <- left_join(results %>% 
                    filter(name == "GAD-7") %>%
                    mutate(value = as.numeric(parse_number(as.character(value)))) %>%
                    group_by(id) %>%
                    summarise(average_GAD = mean(value)),
                  results %>% 
                    filter(name == "PIU-SF-6") %>%
                    mutate(value = as.numeric(parse_number(as.character(value)))) %>%
                    group_by(id) %>%
                    summarise(average_PIU = mean(value)),
                  by="id")

Now, do a regression fit on the filtered data.

fit <- lm(average_PIU ~ average_GAD, data = drop_na(data))

Plot the now-filtered data; you’ll find our sample graph below. (drop_na() removes rows with one or more NA values.)

ggplot(drop_na(data), aes(x = average_GAD, y = average_PIU)) +
  geom_point() +   
  geom_smooth(method = 'lm', formula = y~x) +
  # From https://sejohnston.com/2012/08/09/a-quick-and-easy-function-to-plot-lm-results-in-r/
  labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
                "Intercept =",signif(fit$coef[[1]],5 ),
                " Slope =",signif(fit$coef[[2]], 5),
                " P =",signif(summary(fit)$coef[2,4], 5)),
       x = "Average GAD-7",
       y = "Average PIU-SF-6") + 
  theme_minimal(base_size = 12) +
  theme(
    panel.grid.major = element_blank(),
    panel.background = element_blank(),
    axis.line = element_line(colour = "black"),
    axis.title.x = element_text(size = 12))

Connect to your LAMP server.​

Download data from the LAMP server.​

Code Output​

Optional: Output the data to a CSV.​

Appendix: Sample Analyses

Check out all the activities in the study​

Code Output​

Get number of participants in the study​

Code Output​

Get engagement and plot activity histogram​

Code Output​

Code Output​

Get mean and standard deviation for any survey scale​

Code Output​

Code Output​

Plot survey scores over time for an individual participant​

Compare self-reported anxiety with self-reported problematic internet use​

Connect to your LAMP server.

Download data from the LAMP server.

`Code Output`

Optional: Output the data to a CSV.

Check out all the activities in the study

`Code Output`

Get number of participants in the study

`Code Output`

Get engagement and plot activity histogram

`Code Output`

`Code Output`

Get mean and standard deviation for any survey scale

`Code Output`

`Code Output`

Plot survey scores over time for an individual participant

Compare self-reported anxiety with self-reported problematic internet use