University of Missouri Columbia R Questions Lab Report
Question Description
Lab 2
## v ggplot2## v tibble## v tidyr## v readr
3.3.2 v purrr 0.3.43.0.3 v dplyr 1.0.21.1.2 v stringr 1.4.01.3.1 v forcats 0.5.0
Lab 2
Your Name HereDate Here
This lab will explore some aspects of the tidyverse library in R. Below are three data sets that have alreadybeen loaded in using the read_csv() function from tidyr!!. NOTE: if you get an error loading in the datayou need to either: 1) make sure your lab_r.Rmd file is in the same folder as the data files, or 2) set the filepaths correctly inside the read_csv() functions.
For this lab, each question has pieces of code that need to be filled in. You will need to read the commentsfollowing the code to determine what you need to do. Because the code below is incomplete, the lab will notcompile (knit) until it is complete. I suggest you work in a separate RStudio script and copy you code intothe lab once it is complete.
library(tidyverse)
## — Attaching packages ——————————————————————————
## — Conflicts —————————————————————————————## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Question 1
Lets assume you need to take data from the GPS Training Data and make a plot comparing the playersaverage distance, maximum velocity, and total load over the duration of the training session. Decomposingthis into smaller steps, we will need to:
1. Group information by players,
2. Summarize the pieces of information by player,3. Reorganize the data so it can be plotted, and4. Plot the data.
Below is a skeleton of the code that follows these four steps. You need to fill in the code with the correctarguments. Be sure to read the comments, not everything has to be filled in, just pieces.
match <- read_csv(“GPS_Match_Data_deidentified.csv”, col_types = cols())train <- read_csv(“GPS_Training_Data_deidentified.csv”, col_types = cols())wellness <- read_csv(“Wellness_Data_deidentified.csv”, col_types = cols())
1
–
–
# this creates a new data frame, player_df, that has average distance, maximum velocity, aplayer_df = train %>% # select the data set you wish to pull information from
group_by() %>% # FILL IN, this should choose your grouping variable
summarise(Dist_mean = mean(), Total_load = sum(), Max_Vel = max()) %>% # FILL IN, this srename(Player_Name = ) # relabel the Player Name column to a more usable name. This is N
# this reorganized the data frame, player_df, from wide format to long format. We do thisplayer_df = player_df %>% # select the data set you wish to manipulat
pivot_longer(-Player_Name, names_to = “Measure”, values_to = “Val”) # pivots the df from# you will need to c
# This plots the data
ggplot(player_df, aes(x = , y = log(), fill = Measure, width = 0.5)) + # FILL IN, this wil
geom_bar(position = ‘dodge’, stat = ‘identity’) +
xlab(”) + # FILL IN, this will label your x axis
ylab(”) + # FILL IN, this will label your y axis
ggtitle(”) + # FILL IN, this will title your plot
scale_fill_manual(name = ‘Measure’, labels = c(‘Mean Distance’, ‘Max Velocity’, ‘Total L
Answer
Question 2
For this question, we want to visualize fatigue and stress, from the wellness data, by day when groupingplayers by position. As a general outline, we will:
1. Ensure the Timestamp column is properly coded as a date-time variable,2. Group by date and position,
3. Summarize the fatigue and stress by position and time,
4. Reshape the data from wide to long format (for plotting), and
5. Plot the data, making sure to label everything correctly.
To handle date-time variables with tidyverse, we will need to use the library lubridate. Below are twolines to install, and load in, the lubriate library. All code to handle date-time variables has been suppliedfor this homework – you only need to run and understand what the code does. NOTE: be sure to only runthe install.packages() line once – either comment it out or delete it after you run it.
#### Attaching package: 'lubridate'
## The following objects are masked from 'package:base':#### date, intersect, setdiff, union
Below is a skeleton of the code that follows these five steps. You need to fill in the code with the correctarguments. Be sure to read the comments, not everything has to be filled in, just pieces.
2
nd total load
hould summarizOT required, b
because ggplot
wide to longhange ‘Player_
l be the x and
oad'), values
# install.packages(‘lubridate‘)
library(lubridate)
# create a new df called well with the correct summarized variableswell = wellness %>% # select wellness data
mutate(Timestamp = mdy(str_split(Timestamp, pattern = ” “, simplify = T)[,1])) %>% # jusgroup_by(,) %>% # FILL IN, this should choose the two variable to group bysummarise(Fatigue = mean(), Stress = mean()) # FILL IN, this should calculate the summar
# run this to see what well looks like right now# well
# change from wide to long
well_df = well %>%
pivot_longer(-c(Position, Timestamp), names_to = “Measure”, values_to = “Value”)
- # run this to see what well_df looks like right now
- # well_df# plot the data
ggplot(well_df, aes(x = , y = )) + # FILL IN, this should input the correct x and ygeom_line(aes(color = Position, linetype = Measure), size = 1) + # this makes it a linescale_x_date(date_breaks = ‘day’, date_labels = “%b %d”) + # this makes the x axis showylim(c(2, 5)) + # this sets the limits on the y axis
xlab(”) + # label x if neededylab(”) + # label y if needed
ggtitle(”) + # title if needed
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, size = 10), # this is just foraxis.text.y = element_text(size = 10))
Answer
Question 3
For the last question, we want to create violin plots of the log velocity total distance by position from theGPS_Match_Data_deidentified.csv data set. We also want to include the 25, 50, and 75 percent quantilesas horizontal bars within each group. The general outline is:
1. Choose the correct data set,
2. Select the columns Postion Name and Velocity Band 2 Total Distance through Velocity Band 8
Total Distance,
3. Manipulate the data from wide format to long, making sure NOT to pivot the Postion Name column
(i.e., do -Position Name like we did in Q1 adn Q2), and4. Plot the data, making sure to plot on the log scale.
Below is a skeleton of the code that follows these four steps. You need to fill in the code with the correctarguments. Be sure to read the comments, not everything has to be filled in, just pieces.
3
t run this, thy variables
plot, colors tup as Month Da
matting, make
on, as well as
# create the data set for plotting
position_velocity = %>% # FILL IN, select the correct data
select() %>% # FILL IN, select the correct columns
pivot_longer(, names_to = , values_to = ) # FILL IN, select the correct column to pivot
ggplot(position_velocity, aes(x = , y = )) + # FILL IN, select the x and y valuesgeom_violin(draw_quantiles = c(0.25, 0.5, 0.75)) + # this adds horizontal bars as quantitheme(axis.text.x = element_text(angle = 45, vjust = 0.5, size = 10)) # rotates the x ax
Answer
4
les
is text
"Place your order now for a similar assignment and have exceptional work written by our team of experts, guaranteeing you "A" results."