If you'd like to follow along with this tutorial but don't have an R development environment set up, consider using RStudio Cloud, a free service from the RStudio team.
First, install the LAMP client library.
Then connect to the LAMP API. Your Researcher ID comes from the dashboard at
First, download all Participants and retrieve the ID mapping for all Activities in our Study.
Now we run some pre-processing to flatten the nested data structure that we receive from the
ActivityEvent API for each Participant, performed in a loop over all Participants in our Study. Because some entries may be
null (missing or invalid), we ignore those and flatten the data after converting time stamps from the
UNIX Epoch standard to the human-readable
YYYY-MM-DD format. There’s a workaround for excessively-nested survey items as well.
The below section of code will soon be deprecated and integrated into the LAMP API.
Now, we combine the data frames we’ve pre-processed and ensure they follow the proper data format. Once we reorder the columns to include only variables of interest, we can call the
head() function to preview the data frame. To view all available variables in your data frame, use
Some of the columns we’re selecting tell us about meta information about the Activity (such as whether it’s a survey or game), the individual survey question or game level results, and other optional game data (
static_data, which may include
NA values). To learn more about what each of these columns contains, represents, and can be used for, please see the help topic.
This includes both default activities and custom activities.
Let's view a histogram representation of the data.
![../Topics/Preparing to analyze your data in R/Untitled.png](../Topics/Preparing to analyze your data in R/Untitled.png)
We’ll include only anxiety survey (
GAD-7) results and parse/convert their answer data to numbers using the
readr library. Then, we aggregate by timestamp (which is unique to each Activity), and summarize the table.
Filter the first Participant’s mood (
PHQ-8) results and parse the strings into numbers as they may be either numeric or text. Then, we aggregate by timestamp, which is unique for each Activity, and take the mean of all scores for a given timestamp.
Plot the now-filtered data using the
ggplot2 library; you’ll find our sample graph below.
First, we filter data, using a left-join (which is like a merge operation) between the
PIU-6 survey instruments, such that each participant is a single row. All answer data is converted to numeric values first, and then we aggregate by ID, which is unique to each Activity. Finally, we take the mean of all scores for a given timestamp.
Now, do a regression fit on the filtered data.
Plot the now-filtered data; you’ll find our sample graph below. (
drop_na() removes rows with one or more NA values.)