Demo

Download this Demoscript via “</>Code” (top right)

Depending on your knowledge of R, getting an overview of the data we imported last week might have been quite a challenge. Surprisingly enough, importing, cleaning and exploring your data can be the most challenging, time consuming part of a project. RStudio and the tidyverse offer many helpful tools to make this part easier (and more fun). You have read chapters on dplyr and magrittr as a preparation for this exercise. Before we start with the exercise however, this demo illustrates a simple approach offered by tidyverse which is applicable to sf-objects.

Assume we want to calculate the timelag between subsequent positions. To achieve this we can use the function difftime() combined with lead() from dplyr. Let’s look at these functions one by one.

`difftime`

difftime takes two POSIXct values.

now <- as.POSIXct("2024-04-26 10:20:00")
later <- as.POSIXct("2024-04-26 11:35:00")

later <- now + 10000

later

[1] "2024-04-26 13:06:40 CEST"

time_difference <- difftime(later, now)

time_difference

Time difference of 2.777778 hours

You can also specify the unit of the output.

time_difference <- difftime(later, now, units = "secs")

time_difference

Time difference of 10000 secs

difftime returns an object of the class difftime.

class(time_difference)
## [1] "difftime"

str(time_difference)
##  'difftime' num 10000
##  - attr(*, "units")= chr "secs"

However in our case, numeric values would be more handy than the class difftime. So we’ll wrap the command in as.numeric():

time_difference <- as.numeric(difftime(later, now, units = "secs"))

str(time_difference)
##  num 10000
class(time_difference)
## [1] "numeric"

In fact, we will use this exact operation multiple times, so let’s create a function for this:

difftime_secs <- function(later, now){
    as.numeric(difftime(later, now, units = "secs"))
}

`lead()` / `lag()`

lead() and lag() return a vector of the same length as the input, just offset by a specific number of values (default is 1). Consider the following sequence:

numbers <- 1:10

numbers

 [1]  1  2  3  4  5  6  7  8  9 10

We can now run lead() and lag() on this sequence to illustrate the output. n = specifies the offset, default = specifies the default value used to “fill” the emerging “empty spaces” of the vector. This helps us performing operations on subsequent values in a vector (or rows in a table).

library("dplyr")

lead(numbers)

 [1]  2  3  4  5  6  7  8  9 10 NA

lead(numbers, n = 2)

 [1]  3  4  5  6  7  8  9 10 NA NA

lag(numbers)

 [1] NA  1  2  3  4  5  6  7  8  9

lag(numbers, n = 5)

 [1] NA NA NA NA NA  1  2  3  4  5

lag(numbers, n = 5, default = 0)

 [1] 0 0 0 0 0 1 2 3 4 5

`mutate()`

Using the above functions (difftime() and lead()), we can calculate the time lag, that is, the time difference between consecutive positions. We will try this on a dummy version of our wild boar dataset.

wildschwein <- tibble(
    TierID = c(rep("Hans", 5), rep("Klara", 5)),
    DatetimeUTC = rep(as.POSIXct("2015-01-01 00:00:00", tz = "UTC") + 0:4 * 15 * 60, 2)
)

wildschwein

# A tibble: 10 × 2
   TierID DatetimeUTC        
   <chr>  <dttm>             
 1 Hans   2015-01-01 00:00:00
 2 Hans   2015-01-01 00:15:00
 3 Hans   2015-01-01 00:30:00
 4 Hans   2015-01-01 00:45:00
 5 Hans   2015-01-01 01:00:00
 6 Klara  2015-01-01 00:00:00
 7 Klara  2015-01-01 00:15:00
 8 Klara  2015-01-01 00:30:00
 9 Klara  2015-01-01 00:45:00
10 Klara  2015-01-01 01:00:00

If we are interested to calculate the speed travelled between subsequent locations, we need to calculate the elapsed time first. Since R does most operations in a vectorized manner, we can use difftime_secs on the entire column DatetimeUTC of our dataframe wildschwein and store the output in a new column.

now <- wildschwein$DatetimeUTC
later <- lead(now)

wildschwein$timelag <- difftime_secs(later, now)

wildschwein

# A tibble: 10 × 3
   TierID DatetimeUTC         timelag
   <chr>  <dttm>                <dbl>
 1 Hans   2015-01-01 00:00:00     900
 2 Hans   2015-01-01 00:15:00     900
 3 Hans   2015-01-01 00:30:00     900
 4 Hans   2015-01-01 00:45:00     900
 5 Hans   2015-01-01 01:00:00   -3600
 6 Klara  2015-01-01 00:00:00     900
 7 Klara  2015-01-01 00:15:00     900
 8 Klara  2015-01-01 00:30:00     900
 9 Klara  2015-01-01 00:45:00     900
10 Klara  2015-01-01 01:00:00      NA

However, we have an issue at the transion between the two animals. We can overcome this issue using dplyr’s mutate with group_by. If we use mutate, we do not use the $ notation!

# note the lack of "$"
wildschwein <- mutate(wildschwein, timelag = difftime_secs(lead(DatetimeUTC), DatetimeUTC))

wildschwein

# A tibble: 10 × 3
   TierID DatetimeUTC         timelag
   <chr>  <dttm>                <dbl>
 1 Hans   2015-01-01 00:00:00     900
 2 Hans   2015-01-01 00:15:00     900
 3 Hans   2015-01-01 00:30:00     900
 4 Hans   2015-01-01 00:45:00     900
 5 Hans   2015-01-01 01:00:00   -3600
 6 Klara  2015-01-01 00:00:00     900
 7 Klara  2015-01-01 00:15:00     900
 8 Klara  2015-01-01 00:30:00     900
 9 Klara  2015-01-01 00:45:00     900
10 Klara  2015-01-01 01:00:00      NA

The output is equivalent, we need group_by as well.

`group_by()`

To distinguish groups in a dataframe, we need to specify these using group_by().

# again, note the lack of "$"
wildschwein <- group_by(wildschwein, TierID)

After adding this grouping variable, calculating the timelag automatically accounts for the individual trajectories.

# again, note the lack of "$"
wildschwein <- mutate(wildschwein, timelag = difftime(lead(DatetimeUTC), DatetimeUTC))

wildschwein

# A tibble: 10 × 3
# Groups:   TierID [2]
   TierID DatetimeUTC         timelag
   <chr>  <dttm>              <drtn> 
 1 Hans   2015-01-01 00:00:00 15 mins
 2 Hans   2015-01-01 00:15:00 15 mins
 3 Hans   2015-01-01 00:30:00 15 mins
 4 Hans   2015-01-01 00:45:00 15 mins
 5 Hans   2015-01-01 01:00:00 NA mins
 6 Klara  2015-01-01 00:00:00 15 mins
 7 Klara  2015-01-01 00:15:00 15 mins
 8 Klara  2015-01-01 00:30:00 15 mins
 9 Klara  2015-01-01 00:45:00 15 mins
10 Klara  2015-01-01 01:00:00 NA mins

Piping

Piping can simplify the process and help us write our sequence of operations in a manner as we would explain them to another human being.

In order to make code readable in a more human-friendly way, we can use the piping command (|> or %>%, it does not matter which).

wildschwein |>                                            # Take wildschwein...
    group_by(TierID) |>                                   # ...group it by TierID
    mutate(
        timelag = difftime(lead(DatetimeUTC), DatetimeUTC)# Caculate difftime
        )

# A tibble: 10 × 3
# Groups:   TierID [2]
   TierID DatetimeUTC         timelag
   <chr>  <dttm>              <drtn> 
 1 Hans   2015-01-01 00:00:00 15 mins
 2 Hans   2015-01-01 00:15:00 15 mins
 3 Hans   2015-01-01 00:30:00 15 mins
 4 Hans   2015-01-01 00:45:00 15 mins
 5 Hans   2015-01-01 01:00:00 NA mins
 6 Klara  2015-01-01 00:00:00 15 mins
 7 Klara  2015-01-01 00:15:00 15 mins
 8 Klara  2015-01-01 00:30:00 15 mins
 9 Klara  2015-01-01 00:45:00 15 mins
10 Klara  2015-01-01 01:00:00 NA mins

`summarise()`

If we want to summarise our data and get metrics per animal, we can use the dplyr function summarise(). In contrast to mutate(), which just adds a new column to the dataset, summarise() “collapses” the data to one row per individual (specified by group_by).

summarise(wildschwein, mean = mean(timelag, na.rm = TRUE))

# A tibble: 2 × 2
  TierID mean   
  <chr>  <drtn> 
1 Hans   15 mins
2 Klara  15 mins

difftime

lead() / lag()

mutate()

group_by()