Calculate an empirical variogram from movement data

This function calculates the empirical variogram of multi-dimensional tracking data for visualizing stationary (time-averaged) autocorrelation structure. One of two algorithms is used. The slow \(O(n^2)\) algorithm is based upon Fleming & Calabrese et al (2014), but with interval-weights instead of lag-weights and an iterative algorithm to adjust for calibrated errors. Additional modifications have also been included to accommodate drift in the sampling rate. The fast \(O(n \log n)\) algorithm is based upon the FFT method of Marcotte (1996), with some tweaks to better handle irregularly sampled data. Both methods reduce to the unbiased “method of moments” estimator in the case of evenly scheduled data, even with missing observations, but they produce slightly different outputs for irregularly sampled data.

variogram(data,dt=NULL,fast=TRUE,res=1,CI="Markov",error=FALSE,axes=c("x","y"),
          precision=1/8,trace=TRUE)

Arguments

data: telemetry data object of the 2D timeseries data.
dt: Lag bin width. An ordered array will yield a progressive coarsening of the lags. Defaults to the median sampling interval.
fast: Use the interval-weighted algorithm if FALSE or the FFT algorithm if TRUE. The slow algorithm outputs a progress bar.
res: Increase the discretization resolution for irregularly sampled data with res>1. Decreases bias at the cost of smoothness.
CI: Argument for confidence-interval estimation. Can be "IID" to consider all unique lags as independent, "Markov" to consider only non-overlapping lags as independent, or "Gauss" for an exact calculation (see Details below).
error: Adjust for the effect of calibrated errors.
axes: Array of axes to calculate an average (isotropic) variogram for.

precision: Fraction of machine precision to target when adjusting for telemetry error (fast=FALSE with calibrated errors). precision=1/8 returns about 2 decimal digits of precision.
trace: Display a progress bar if fast=FALSE.

Details

If no dt is specified, the median sampling interval is used. This is typically a good assumption for most data, even when there are gaps. A dt coarser than the sampling interval may bias the variogram (particuarly if fast=TRUE) and so this should be reserved for poor data quality.

For irregularly sampled data, it may be useful to provide an array of time-lag bin widths to progressively coarsen the variogram. I.e., if you made the very bad choice of changing your sampling interval on the fly from dt1 to dt2, where dt1 \(<\) dt2, the an appropriate choice would be dt=c(dt1,dt2). On the other hand, if your sampling is itself a noisy process, then you might want to introduce larger and larger dt components as the visual appearance of the variogram breaks down with increasing lags. Alternatively, you might try the fast=FALSE option or aggregating multiple individuals with mean.variogram.

With irregularly sampled data, different size lags must be aggregated together, and with current fast methods there is a tradeoff between bias and smoothness. The default settings produce a relatively smooth estimate, while increasing res (or setting fast=FALSE) will produce a less biased estimate, which is very useful for correlogram.

In conventional variogram regression treatments, all lags are considered as independent (CI="IID") for the purposes of confidence-interval estimation, even if they overlap in time. However, in high resolution datasets this will produce vastly underestimated confidence intervals. Therefore, the default CI="Markov" behavior is to consider only the maximum number of non-overlapping lags in calculating confidence intervals, though this is a crude approximation and is overly conservative at large lags. CI="Gauss" implements exact confidence intervals under the assumption of a stationary Gaussian process, but this algorithm is \(O(n^2 \log n)\) even when fast=TRUE.

If fast=FALSE and the tracking data are calibrated (see uere), then with error=TRUE the variogram of the movement process (sans the telemetry-error process) is estimated using an iterative maximum-likelihood esitmator that downweights more erroneous location estimates (Fleming et al, 2020). The variogram is targeted to have precision fraction of machine precision. If the data are very irregular and location errors are very homoskedastic, then this algorithm can be slow to converge at time lags where there are few data pairs. If fast=TRUE and error=TRUE, then the estimated contribution to the variogram from location error is subtracted on a per lag basis, which is less ideal for heteroskedastic errors.

Value

Returns a variogram object (class variogram) which is a dataframe containing the time-lag, lag, the semi-variance estimate at that lag, SVF, and the approximate number of degrees of freedom associated with that semi-variance, DOF, with which its confidence intervals can be estimated.

References

D. Marcotte, ``Fast variogram computation with FFT'', Computers and Geosciences 22:10, 1175-1186 (1996) doi:10.1016/S0098-3004(96)00026-X .

C. H. Fleming, J. M. Calabrese, T. Mueller, K.A. Olson, P. Leimgruber, W. F. Fagan, ``From fine-scale foraging to home ranges: A semi-variance approach to identifying movement modes across spatiotemporal scales'', The American Naturalist, 183:5, E154-E167 (2014) doi:10.1086/675504 .

C. H. Fleming et al, ``A comprehensive framework for handling location error in animal tracking data'', bioRxiv (2020) doi:10.1101/2020.06.12.130195 .

Author

C. H. Fleming and J. M. Calabrese.

Note

Prior to ctmm v0.3.6, fast=FALSE used the lag-weighted esitmator of Fleming et al (2014). Lag weights have been abandoned in favor of interval weights, which are less sensitive to sampling irregularity. The same weighting formulas are used, but with dt instead of the current lag.

Examples

#Load package and data
library(ctmm)
data(buffalo)

#Extract movement data for a single animal
DATA <- buffalo$Cilla

#Calculate variogram
SVF <- variogram(DATA)

#Plot the variogram with 50% and 95% CIs
plot(SVF,level=c(0.5,0.95))