<- as.POSIXct("2024-04-26 10:20:00")
now <- as.POSIXct("2024-04-26 11:35:00")
[1] "2024-04-26 11:35:00 CEST"
<- difftime(later, now)
Time difference of 1.25 hours
Download this Demoscript via “</>Code” (top right)
Depending on your knowledge of R
, getting an overview of the data we imported last week might have been quite a challenge. Surprisingly enough, importing, cleaning and exploring your data can be the most challenging, time consuming part of a project. RStudio and the tidyverse offer many helpful tools to make this part easier (and more fun). You have read chapters on dplyr
and magrittr
as a preparation for this exercise. Before we start with the exercise however, this demo illustrates a simple approach offered by tidyverse which is applicable to sf-objects.
Assume we want to calculate the timelag between subsequent positions. To achieve this we can use the function difftime()
combined with lead()
from dplyr
. Let’s look at these functions one by one.
takes two POSIXct
<- as.POSIXct("2024-04-26 10:20:00")
now <- as.POSIXct("2024-04-26 11:35:00")
[1] "2024-04-26 11:35:00 CEST"
<- difftime(later, now)
Time difference of 1.25 hours
You can also specify the unit of the output.
<- difftime(later, now, units = "secs")
Time difference of 4500 secs
returns an object of the class difftime
## [1] "difftime"
## 'difftime' num 4500
## - attr(*, "units")= chr "secs"
However in our case, numeric values would be more handy than the class difftime
. So we’ll wrap the command in as.numeric()
<- as.numeric(difftime(later, now, units = "secs"))
## num 4500
## [1] "numeric"
In fact, we will use this exact operation multiple times, so let’s create a function for this:
<- function(later, now){
difftime_secs as.numeric(difftime(later, now, units = "secs"))
/ lag()
and lag()
return a vector of the same length as the input, just offset by a specific number of values (default is 1). Consider the following sequence:
<- 1:10
[1] 1 2 3 4 5 6 7 8 9 10
We can now run lead()
and lag()
on this sequence to illustrate the output. n =
specifies the offset, default =
specifies the default value used to “fill” the emerging “empty spaces” of the vector. This helps us performing operations on subsequent values in a vector (or rows in a table).
[1] 2 3 4 5 6 7 8 9 10 NA
lead(numbers, n = 2)
[1] 3 4 5 6 7 8 9 10 NA NA
[1] NA 1 2 3 4 5 6 7 8 9
lag(numbers, n = 5)
[1] NA NA NA NA NA 1 2 3 4 5
lag(numbers, n = 5, default = 0)
[1] 0 0 0 0 0 1 2 3 4 5
Using the above functions (difftime()
and lead()
), we can calculate the time lag, that is, the time difference between consecutive positions. We will try this on a dummy version of our wild boar dataset.
<- tibble( # aka data.frame
wildschwein TierID = rep(c("Hans", "Klara"), each = 5),
DatetimeUTC = rep(as.POSIXct("2015-01-01 00:00:00", tz = "UTC") + 0:4 * 15 * 60, 2)
# A tibble: 10 × 2
TierID DatetimeUTC
<chr> <dttm>
1 Hans 2015-01-01 00:00:00
2 Hans 2015-01-01 00:15:00
3 Hans 2015-01-01 00:30:00
4 Hans 2015-01-01 00:45:00
5 Hans 2015-01-01 01:00:00
6 Klara 2015-01-01 00:00:00
7 Klara 2015-01-01 00:15:00
8 Klara 2015-01-01 00:30:00
9 Klara 2015-01-01 00:45:00
10 Klara 2015-01-01 01:00:00
If we are interested to calculate the speed travelled between subsequent locations, we need to calculate the elapsed time first. Since R does most operations in a vectorized manner, we can use difftime_secs
on the entire column DatetimeUTC
of our dataframe wildschwein
and store the output in a new column.
<- wildschwein$DatetimeUTC
now <- lead(now)
# View(wildschwein)
$timelag <- difftime_secs(later, now)
# A tibble: 10 × 3
TierID DatetimeUTC timelag
<chr> <dttm> <dbl>
1 Hans 2015-01-01 00:00:00 900
2 Hans 2015-01-01 00:15:00 900
3 Hans 2015-01-01 00:30:00 900
4 Hans 2015-01-01 00:45:00 900
5 Hans 2015-01-01 01:00:00 -3600
6 Klara 2015-01-01 00:00:00 900
7 Klara 2015-01-01 00:15:00 900
8 Klara 2015-01-01 00:30:00 900
9 Klara 2015-01-01 00:45:00 900
10 Klara 2015-01-01 01:00:00 NA
However, we have an issue at the transition between the two animals. We can overcome this issue using dplyr’s mutate
with group_by
. If we use mutate
, we do not use the $
# note the lack of "$"
<- mutate(wildschwein, timelag = difftime_secs(lead(DatetimeUTC), DatetimeUTC))
# A tibble: 10 × 3
TierID DatetimeUTC timelag
<chr> <dttm> <dbl>
1 Hans 2015-01-01 00:00:00 900
2 Hans 2015-01-01 00:15:00 900
3 Hans 2015-01-01 00:30:00 900
4 Hans 2015-01-01 00:45:00 900
5 Hans 2015-01-01 01:00:00 -3600
6 Klara 2015-01-01 00:00:00 900
7 Klara 2015-01-01 00:15:00 900
8 Klara 2015-01-01 00:30:00 900
9 Klara 2015-01-01 00:45:00 900
10 Klara 2015-01-01 01:00:00 NA
The output is equivalent, we need group_by
as well.
To distinguish groups in a dataframe, we need to specify these using group_by()
# again, note the lack of "$"
<- group_by(wildschwein, TierID) wildschwein
After adding this grouping variable, calculating the timelag
automatically accounts for the individual trajectories.
# again, note the lack of "$"
<- mutate(wildschwein, timelag = difftime_secs(lead(DatetimeUTC), DatetimeUTC))
# A tibble: 10 × 3
# Groups: TierID [2]
TierID DatetimeUTC timelag
<chr> <dttm> <dbl>
1 Hans 2015-01-01 00:00:00 900
2 Hans 2015-01-01 00:15:00 900
3 Hans 2015-01-01 00:30:00 900
4 Hans 2015-01-01 00:45:00 900
5 Hans 2015-01-01 01:00:00 NA
6 Klara 2015-01-01 00:00:00 900
7 Klara 2015-01-01 00:15:00 900
8 Klara 2015-01-01 00:30:00 900
9 Klara 2015-01-01 00:45:00 900
10 Klara 2015-01-01 01:00:00 NA
Piping can simplify the process and help us write our sequence of operations in a manner as we would explain them to another human being.
In order to make code readable in a more human-friendly way, we can use the piping command (|>
or %>%
, it does not matter which).
|> # Take wildschwein...
wildschwein group_by(TierID) |> # ...group it by TierID
timelag = difftime_secs(lead(DatetimeUTC), DatetimeUTC)# Caculate difftime
# A tibble: 10 × 3
# Groups: TierID [2]
TierID DatetimeUTC timelag
<chr> <dttm> <dbl>
1 Hans 2015-01-01 00:00:00 900
2 Hans 2015-01-01 00:15:00 900
3 Hans 2015-01-01 00:30:00 900
4 Hans 2015-01-01 00:45:00 900
5 Hans 2015-01-01 01:00:00 NA
6 Klara 2015-01-01 00:00:00 900
7 Klara 2015-01-01 00:15:00 900
8 Klara 2015-01-01 00:30:00 900
9 Klara 2015-01-01 00:45:00 900
10 Klara 2015-01-01 01:00:00 NA
If we want to summarise our data and get metrics per animal, we can use the dplyr
function summarise()
. In contrast to mutate()
, which just adds a new column to the dataset, summarise()
“collapses” the data to one row per individual (specified by group_by
summarise(wildschwein, mean = mean(timelag, na.rm = TRUE))
# A tibble: 2 × 2
TierID mean
<chr> <dbl>
1 Hans 900
2 Klara 900