Preparing to analyze your data in R

Preparing to analyze your data in R

Preparing to analyze your data in R

🌟

If you'd like to follow along with this tutorial but don't have an R development environment set up, consider using RStudio Cloud, a free service from the RStudio team.

Connect to your LAMP server.

First, install the LAMP client library.

install.packages("devtools")
devtools::install_github('BIDMCDigitalPsychiatry/LAMP-r') 

Then connect to the LAMP API. Your Researcher ID comes from the dashboard at dashboard.lamp.digital.

library(LAMP)
LAMP <- LAMP$new('https://api.lamp.digital', 'your_email_here@email.com', 'your_password_here')
researcher_id <- "YOUR_RESEARCHER_ID"

Download data from the LAMP server.

First, download all Participants and retrieve the ID mapping for all Activities in our Study.

library(dplyr)
library(anytime)
library(data.table)

participants <- LAMP$Participant$allByStudy(researcher_id) %>% pull(id)
activity_map <- LAMP$Activity$allByStudy(researcher_id) %>% 
    dplyr::select(id,name) %>%
		rename(activity = id)

Now we run some pre-processing to flatten the nested data structure that we receive from the ActivityEvent API for each Participant, performed in a loop over all Participants in our Study. Because some entries may be null (missing or invalid), we ignore those and flatten the data after converting time stamps from the UNIX Epoch standard to the human-readable YYYY-MM-DD format. There’s a workaround for excessively-nested survey items as well.

⚠️

The below section of code will soon be deprecated and integrated into the LAMP API.

data_list <- list()
for (i in 1:length(participants)) {
  tmp <- LAMP$ActivityEvent$allByParticipant(participants[i])
  if (length(tmp) > 0) {
    tmp <- right_join(activity_map %>% select(activity, name), 
                      jsonlite::flatten(tmp) %>% 
                        mutate(timestamp = anytime(as.numeric(timestamp)/1000, tz = "America/New_York")) %>%
                        rename(activity_duration = duration) %>% 
                        mutate(id = participants[i]),
                      by="activity")
    tmp_event_list <- list()
    for (j in 1:nrow(tmp)) {
      if (length(tmp[j,]$temporal_events[[1]]) > 0) {
				tmp_event_list[[j]] <- cbind(tmp[j,], tmp[j,]$temporal_events[[1]], row.names=F) %>% 
																	dplyr::select(-temporal_events)
      }
    }
    data_list[[i]] <- rbindlist(tmp_event_list, fill = T)
  }
}
data_list[sapply(data_list, is.null)] <- NULL

Now, we combine the data frames we’ve pre-processed and ensure they follow the proper data format. Once we reorder the columns to include only variables of interest, we can call the head() function to preview the data frame. To view all available variables in your data frame, use colnames(result_events_mod). Some of the columns we’re selecting tell us about meta information about the Activity (such as whether it’s a survey or game), the individual survey question or game level results, and other optional game data (static_data, which may include NA values). To learn more about what each of these columns contains, represents, and can be used for, please see the

help topic.

result_events <- rbindlist(data_list, fill = T) %>% 
	mutate(id = as.character(id)) %>% 
	mutate(activity = as.character(activity)) %>% 
	mutate(name = as.character(name)) 

head(results)

Code Output

id  timestamp  name                                                                                                 item value
1 U1094374134 2019-12-08 PHQ-8       How often did you feel bad about yourself, or that you were a failure or let your family down?     1
2 U1094374134 2019-12-08 PHQ-8          How often did have you have trouble concentrating on things such as reading or watching tv?     1
3 U1094374134 2019-12-08 PHQ-8       How often did you find yourself moving so slowly, or so fidgety/restless, that others noticed?     1
4 U1094374134 2019-12-08 PHQ-8 How often did you have trouble falling or staying asleep, or sleep for more hours than you meant to?     1
5 U1094374134 2019-12-08 PHQ-8                                          How often did you feel tired or like you had little energy?     1
6 U1094374134 2019-12-08 PHQ-8                  How often did you find yourself with no appetite, or eating more than you meant to?     1

Optional: Output the data to a CSV.

write.csv(results, "./output-1-27-20.csv", row.names = F)
🌟

To learn more about the data types and format, see here:

Appendix: Sample Analyses

Check out all the activities in the study

This includes both default activities and custom activities.

activity_map %>% select(activity, spec, name)

Code Output

##                    activity                    spec                                     name
## 1  QWN0aXZpdHk6MDoxMDM6MjM~              lamp.group                    Daily Survey Check-In
## 2  QWN0aXZpdHk6MToxMDM6MzIw             lamp.survey                                    PHQ-8
## 3  QWN0aXZpdHk6MToxMDM6MzQx             lamp.survey                             Instructions
## 4  QWN0aXZpdHk6MToxMDM6MzQy             lamp.survey                                    GAD-7
## 5  QWN0aXZpdHk6MToxMDM6MzQz             lamp.survey                                 PIU-SF-6
## 6  QWN0aXZpdHk6MToxMDM6MzQ0             lamp.survey                                  Warning
## 7  QWN0aXZpdHk6MToxMDM6MzQ1             lamp.survey Qualitative Digital Media Use Assessment
## 8  QWN0aXZpdHk6MToxMDM6MzQ2             lamp.survey           Screen Time Use - iPhones only
## 9  QWN0aXZpdHk6MToxMDM6MzQ3             lamp.survey                iPhone/Android Assessment
## 10 QWN0aXZpdHk6MjoxMDM6MA~~              lamp.nback                                   N-Back
## 11 QWN0aXZpdHk6MzoxMDM6MA~~           lamp.trails_b                                 Trails B
## 12 QWN0aXZpdHk6NDoxMDM6MA~~       lamp.spatial_span                             Spatial Span
## 13 QWN0aXZpdHk6NToxMDM6MA~~      lamp.simple_memory                            Simple Memory
## 14 QWN0aXZpdHk6NjoxMDM6MA~~           lamp.serial7s                                Serial 7s
## 15 QWN0aXZpdHk6NzoxMDM6MA~~      lamp.cats_and_dogs                            Cats and Dogs
## 16 QWN0aXZpdHk6ODoxMDM6MA~~     lamp.3d_figure_copy                           3D Figure Copy
## 17 QWN0aXZpdHk6OToxMDM6MA~~ lamp.visual_association                       Visual Association
## 18 QWN0aXZpdHk6MTA6MTAzOjA~         lamp.digit_span                               Digit Span
## 19 QWN0aXZpdHk6MTE6MTAzOjA~  lamp.cats_and_dogs_new                        Cats and Dogs New
## 20 QWN0aXZpdHk6MTI6MTAzOjA~     lamp.temporal_order                           Temporal Order
## 21 QWN0aXZpdHk6MTM6MTAzOjA~          lamp.nback_new                               N-Back New
## 22 QWN0aXZpdHk6MTQ6MTAzOjA~       lamp.trails_b_new                             Trails B New
## 23 QWN0aXZpdHk6MTU6MTAzOjA~ lamp.trails_b_dot_touch                       Trails B Dot Touch
## 24 QWN0aXZpdHk6MTY6MTAzOjA~           lamp.jewels_a                          Jewels Trails A
## 25 QWN0aXZpdHk6MTc6MTAzOjA~           lamp.jewels_b                          Jewels Trails B
## 26 QWN0aXZpdHk6MTg6MTAzOjA~      lamp.scratch_image                            Scratch Image
## 27 QWN0aXZpdHk6MTk6MTAzOjA~         lamp.spin_wheel                               Spin Wheel

Get number of participants in the study

print(paste("Number of Participants:", length(participants)))

Code Output

## [1] "Number of Participants: 29"

Get engagement and plot activity histogram

engagement_data <- inner_join(results %>% dplyr::count(id), 
                              results %>% group_by(id) %>% filter(row_number()==n()) %>% select(id, timestamp), 
                              by="id") %>% 
  rename(activities.completed = n) %>% rename(most.recent.activity = timestamp)

engagement_data

Code Output

## # A tibble: 27 x 3
##    id          activities.completed most.recent.activity
##    <chr>                      <int> <dttm>              
##  1 U1005979819                  659 2019-11-16 22:50:57 
##  2 U1094374134                 2085 2019-08-14 12:23:15 
##  3 U1126469507                  503 2019-12-10 21:52:20 
##  4 U1232915366                  226 2020-01-29 00:51:34 
##  5 U1235780769                  767 2019-12-09 20:08:02 
##  6 U1367615199                  813 2019-11-13 16:11:09 
##  7 U1500960001                  742 2019-10-25 21:26:24 
##  8 U1680931766                 1070 2019-12-13 22:41:58 
##  9 U176381486                   585 2019-11-08 18:45:24 
## 10 U2127860149                  170 2019-11-13 16:51:52 
## # … with 17 more rows

Let's view a histogram representation of the data.

hist(engagement_data$activities.completed, breaks=15)

Code Output

image
image

Get mean and standard deviation for any survey scale

We’ll include only anxiety survey (GAD-7) results and parse/convert their answer data to numbers using the readr library. Then, we aggregate by timestamp (which is unique to each Activity), and summarize the table.

library(readr)

data <- results %>% 
  filter(name == "GAD-7") %>% 
  mutate(value = as.numeric(parse_number(as.character(value)))) %>%
  group_by(id,timestamp) %>%
  summarise(average_GAD = mean(value))

summary(data$average_GAD)

Code Output

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.4286  0.8571  0.9139  1.4286  3.0000
print(paste("Mean GAD-7 across all participants:", round(mean(data$average_GAD), 3), "±", round(sd(data$average_GAD), 3)))

Code Output

## [1] "Mean GAD-7 across all participants: 0.914 ± 0.667"

Plot survey scores over time for an individual participant

Filter the first Participant’s mood (PHQ-8) results and parse the strings into numbers as they may be either numeric or text. Then, we aggregate by timestamp, which is unique for each Activity, and take the mean of all scores for a given timestamp.

data <- results %>% 
  filter(id == participants[1] & name == "PHQ-8") %>%
  mutate(value = as.numeric(as.character(value))) %>%
  group_by(timestamp) %>%
  summarise(average_score = mean(value))

Plot the now-filtered data using the ggplot2 library; you’ll find our sample graph below.

library(ggplot2)
ggplot(data, aes(x = timestamp, y = average_score)) +
  geom_line(size=1, color="steelblue") +
  theme_minimal(base_size = 15) +
  ylim(0,4) +
  theme(
    panel.grid.major = element_blank(),
    panel.background = element_blank(),
    axis.line = element_line(colour = "black"),
    axis.title.x = element_text(size = 15))

Code Output

image

Compare self-reported anxiety with self-reported problematic internet use

First, we filter data, using a left-join (which is like a merge operation) between the GAD-7 and PIU-6 survey instruments, such that each participant is a single row. All answer data is converted to numeric values first, and then we aggregate by ID, which is unique to each Activity. Finally, we take the mean of all scores for a given timestamp.

image
library(tidyr)
library(Hmisc)

data <- left_join(results %>% 
                    filter(name == "GAD-7") %>%
                    mutate(value = as.numeric(parse_number(as.character(value)))) %>%
                    group_by(id) %>%
                    summarise(average_GAD = mean(value)),
                  results %>% 
                    filter(name == "PIU-SF-6") %>%
                    mutate(value = as.numeric(parse_number(as.character(value)))) %>%
                    group_by(id) %>%
                    summarise(average_PIU = mean(value)),
                  by="id")

Now, do a regression fit on the filtered data.

fit <- lm(average_PIU ~ average_GAD, data = drop_na(data))

Plot the now-filtered data; you’ll find our sample graph below. (drop_na() removes rows with one or more NA values.)

ggplot(drop_na(data), aes(x = average_GAD, y = average_PIU)) +
  geom_point() +   
  geom_smooth(method = 'lm', formula = y~x) +
  # From https://sejohnston.com/2012/08/09/a-quick-and-easy-function-to-plot-lm-results-in-r/
  labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
                "Intercept =",signif(fit$coef[[1]],5 ),
                " Slope =",signif(fit$coef[[2]], 5),
                " P =",signif(summary(fit)$coef[2,4], 5)),
       x = "Average GAD-7",
       y = "Average PIU-SF-6") + 
  theme_minimal(base_size = 12) +
  theme(
    panel.grid.major = element_blank(),
    panel.background = element_blank(),
    axis.line = element_line(colour = "black"),
    axis.title.x = element_text(size = 12))

Code Output

image
image

Was there something we didn't cover, or need more help? Let us know by making a post in the LAMP Community, or contact us directly. Thank you for your contribution! 🌟 Page last updated on June 30th, 2020.