now <- as.POSIXct("2024-04-26 10:20:00")
later <- as.POSIXct("2024-04-26 11:35:00")
later <- now + 10000
later[1] "2024-04-26 13:06:40 CEST"
time_difference <- difftime(later, now)
time_differenceTime difference of 2.777778 hours
Download this Demoscript via “</>Code” (top right)
Depending on your knowledge of R, getting an overview of the data we imported last week might have been quite a challenge. Surprisingly enough, importing, cleaning and exploring your data can be the most challenging, time consuming part of a project. RStudio and the tidyverse offer many helpful tools to make this part easier (and more fun). You have read chapters on dplyr and magrittr as a preparation for this exercise. Before we start with the exercise however, this demo illustrates a simple approach offered by tidyverse which is applicable to sf-objects.
Assume we want to calculate the timelag between subsequent positions. To achieve this we can use the function difftime() combined with lead() from dplyr. Let’s look at these functions one by one.
difftimedifftime takes two POSIXct values.
now <- as.POSIXct("2024-04-26 10:20:00")
later <- as.POSIXct("2024-04-26 11:35:00")
later <- now + 10000
later[1] "2024-04-26 13:06:40 CEST"
time_difference <- difftime(later, now)
time_differenceTime difference of 2.777778 hours
You can also specify the unit of the output.
time_difference <- difftime(later, now, units = "secs")
time_differenceTime difference of 10000 secs
difftime returns an object of the class difftime.
class(time_difference)
## [1] "difftime"
str(time_difference)
## 'difftime' num 10000
## - attr(*, "units")= chr "secs"However in our case, numeric values would be more handy than the class difftime. So we’ll wrap the command in as.numeric():
time_difference <- as.numeric(difftime(later, now, units = "secs"))
str(time_difference)
## num 10000
class(time_difference)
## [1] "numeric"In fact, we will use this exact operation multiple times, so let’s create a function for this:
difftime_secs <- function(later, now){
as.numeric(difftime(later, now, units = "secs"))
}lead() / lag()lead() and lag() return a vector of the same length as the input, just offset by a specific number of values (default is 1). Consider the following sequence:
numbers <- 1:10
numbers [1] 1 2 3 4 5 6 7 8 9 10
We can now run lead() and lag() on this sequence to illustrate the output. n = specifies the offset, default = specifies the default value used to “fill” the emerging “empty spaces” of the vector. This helps us performing operations on subsequent values in a vector (or rows in a table).
library("dplyr")
lead(numbers) [1] 2 3 4 5 6 7 8 9 10 NA
lead(numbers, n = 2) [1] 3 4 5 6 7 8 9 10 NA NA
lag(numbers) [1] NA 1 2 3 4 5 6 7 8 9
lag(numbers, n = 5) [1] NA NA NA NA NA 1 2 3 4 5
lag(numbers, n = 5, default = 0) [1] 0 0 0 0 0 1 2 3 4 5
mutate()Using the above functions (difftime() and lead()), we can calculate the time lag, that is, the time difference between consecutive positions. We will try this on a dummy version of our wild boar dataset.
wildschwein <- tibble(
TierID = c(rep("Hans", 5), rep("Klara", 5)),
DatetimeUTC = rep(as.POSIXct("2015-01-01 00:00:00", tz = "UTC") + 0:4 * 15 * 60, 2)
)
wildschwein# A tibble: 10 × 2
TierID DatetimeUTC
<chr> <dttm>
1 Hans 2015-01-01 00:00:00
2 Hans 2015-01-01 00:15:00
3 Hans 2015-01-01 00:30:00
4 Hans 2015-01-01 00:45:00
5 Hans 2015-01-01 01:00:00
6 Klara 2015-01-01 00:00:00
7 Klara 2015-01-01 00:15:00
8 Klara 2015-01-01 00:30:00
9 Klara 2015-01-01 00:45:00
10 Klara 2015-01-01 01:00:00
If we are interested to calculate the speed travelled between subsequent locations, we need to calculate the elapsed time first. Since R does most operations in a vectorized manner, we can use difftime_secs on the entire column DatetimeUTC of our dataframe wildschwein and store the output in a new column.
now <- wildschwein$DatetimeUTC
later <- lead(now)
wildschwein$timelag <- difftime_secs(later, now)
wildschwein# A tibble: 10 × 3
TierID DatetimeUTC timelag
<chr> <dttm> <dbl>
1 Hans 2015-01-01 00:00:00 900
2 Hans 2015-01-01 00:15:00 900
3 Hans 2015-01-01 00:30:00 900
4 Hans 2015-01-01 00:45:00 900
5 Hans 2015-01-01 01:00:00 -3600
6 Klara 2015-01-01 00:00:00 900
7 Klara 2015-01-01 00:15:00 900
8 Klara 2015-01-01 00:30:00 900
9 Klara 2015-01-01 00:45:00 900
10 Klara 2015-01-01 01:00:00 NA
However, we have an issue at the transion between the two animals. We can overcome this issue using dplyr’s mutate with group_by. If we use mutate, we do not use the $ notation!
# note the lack of "$"
wildschwein <- mutate(wildschwein, timelag = difftime_secs(lead(DatetimeUTC), DatetimeUTC))
wildschwein# A tibble: 10 × 3
TierID DatetimeUTC timelag
<chr> <dttm> <dbl>
1 Hans 2015-01-01 00:00:00 900
2 Hans 2015-01-01 00:15:00 900
3 Hans 2015-01-01 00:30:00 900
4 Hans 2015-01-01 00:45:00 900
5 Hans 2015-01-01 01:00:00 -3600
6 Klara 2015-01-01 00:00:00 900
7 Klara 2015-01-01 00:15:00 900
8 Klara 2015-01-01 00:30:00 900
9 Klara 2015-01-01 00:45:00 900
10 Klara 2015-01-01 01:00:00 NA
The output is equivalent, we need group_by as well.
group_by()To distinguish groups in a dataframe, we need to specify these using group_by().
# again, note the lack of "$"
wildschwein <- group_by(wildschwein, TierID)After adding this grouping variable, calculating the timelag automatically accounts for the individual trajectories.
# again, note the lack of "$"
wildschwein <- mutate(wildschwein, timelag = difftime(lead(DatetimeUTC), DatetimeUTC))
wildschwein# A tibble: 10 × 3
# Groups: TierID [2]
TierID DatetimeUTC timelag
<chr> <dttm> <drtn>
1 Hans 2015-01-01 00:00:00 15 mins
2 Hans 2015-01-01 00:15:00 15 mins
3 Hans 2015-01-01 00:30:00 15 mins
4 Hans 2015-01-01 00:45:00 15 mins
5 Hans 2015-01-01 01:00:00 NA mins
6 Klara 2015-01-01 00:00:00 15 mins
7 Klara 2015-01-01 00:15:00 15 mins
8 Klara 2015-01-01 00:30:00 15 mins
9 Klara 2015-01-01 00:45:00 15 mins
10 Klara 2015-01-01 01:00:00 NA mins
Piping can simplify the process and help us write our sequence of operations in a manner as we would explain them to another human being.
In order to make code readable in a more human-friendly way, we can use the piping command (|> or %>%, it does not matter which).
wildschwein |> # Take wildschwein...
group_by(TierID) |> # ...group it by TierID
mutate(
timelag = difftime(lead(DatetimeUTC), DatetimeUTC)# Caculate difftime
)# A tibble: 10 × 3
# Groups: TierID [2]
TierID DatetimeUTC timelag
<chr> <dttm> <drtn>
1 Hans 2015-01-01 00:00:00 15 mins
2 Hans 2015-01-01 00:15:00 15 mins
3 Hans 2015-01-01 00:30:00 15 mins
4 Hans 2015-01-01 00:45:00 15 mins
5 Hans 2015-01-01 01:00:00 NA mins
6 Klara 2015-01-01 00:00:00 15 mins
7 Klara 2015-01-01 00:15:00 15 mins
8 Klara 2015-01-01 00:30:00 15 mins
9 Klara 2015-01-01 00:45:00 15 mins
10 Klara 2015-01-01 01:00:00 NA mins
summarise()If we want to summarise our data and get metrics per animal, we can use the dplyr function summarise(). In contrast to mutate(), which just adds a new column to the dataset, summarise() “collapses” the data to one row per individual (specified by group_by).
summarise(wildschwein, mean = mean(timelag, na.rm = TRUE))# A tibble: 2 × 2
TierID mean
<chr> <drtn>
1 Hans 15 mins
2 Klara 15 mins