ctmm.fit.Rd
These functions allow one to propose hypothetical movement models (with initial estimates), fit those models to the data, and select among those models via an information criterion.
The fitting functions wrap around optim
and ctmm.loglike
to fit continuous-time movement models to 2D animal tracking data as described in Fleming et al (2014) and Fleming et al (2015), and Fleming et al (2017).
ctmm(tau=NULL,omega=FALSE,isotropic=FALSE,range=TRUE,circle=FALSE,error=FALSE,
axes=c("x","y"),...)
ctmm.loglike(data,CTMM,REML=FALSE,profile=TRUE,zero=0,verbose=FALSE,compute=TRUE,...)
ctmm.fit(data,CTMM=ctmm(),method="pHREML",COV=TRUE,control=list(),trace=FALSE)
ctmm.select(data,CTMM,verbose=FALSE,level=1,IC="AICc",MSPE="position",trace=FALSE,cores=1,
...)
Array of autocorrelation timescales (explained below).
Frequency (\(2\pi/period\)) of oscillatory range crossings.
A Boolean denoting whether or not the animal's covariance is circular or elliptical.
A Boolean denoting whether or not the movement model has a finite range.
(\(2\pi\) divided by) the period it takes the animal to stochastically circle its mean location.
A Boolean denoting whether or not to use annotated telemetry error estimates or an estimate of the error's standard deviation if the data are not annotated with error estimates or when \(HDOP=1\).
Spatial dimensions of the movement model.
Timeseries data represented as a telemetry
object.
A ctmm
movement-model object containing the initial parameter guesses conforming to the basic structure of the model hypothesis. ctmm.select
can accept a list of such objects.
Use residual maximum likelihood if TRUE
. Not recommended.
Analytically solve for as many covariance parameters as possible.
Calculates \(log(likelihood) - zero\), instead of just \(log(likelihood)\), in a way that maintains numerical precision if the constant zero
is close to the log likelihood. Used internally by ctmm.fit
.
Return additional information. See "Value" below.
Only return computational information if FALSE
.
Fitting method to use: "ML"
, "HREML"
, "pREML"
, "pHREML"
, or "REML"
. See "Description" below.
Estimate the autocorrelation parameter covariance matrix.
An optional argument list for optimizer
.
Report progress updates. Can be among 0:3
with increasing detail.
Attempt to simplify a model if a feature's non-existence falls within this level of confidence interval.
Information criterion used for selection. Can be "AICc"
, "AIC"
, "BIC"
, "LOOCV"
, "HSCV"
, or none (NA
). AICc is approximate.
Reject non-stationary features that increase the mean square predictive error of "position"
, "velocity"
, or not (NA
).
Maximum number of models to fit in parallel. cores=0 will use all cores, while cores<0 will reserve abs(cores).
Further arguments passed to ctmm.fit
.
Model fitting and selection first requires a prototype model with guesstimated parameters (i.e., Brownian motion with a particular diffusion rate).
The initial ctmm
parameter guess can be generated by the output of ctmm.guess
, variogram.fit
or manually specified with the function ctmm(...)
, where the argument tau
is explained below and additional model options described in vignette("ctmm")
.
By default, tau
(\(\tau\)) is an ordered array of autocorrelation timescales.
If length(tau)==0
, then an IID bi-variate Gaussian model is fit to the data.
If length(tau)==1
, then an Ornstein-Uhlenbeck (OU) model (Brownian motion restricted to a finite home range) is fit the data, where tau
is the position autocorrelation timescale. tau=Inf
then yields Brownian motion (BM).
If length(tau)==2
, then the OUF model (continuous-velocity motion restricted to a finite home range) is fit to the data, where tau[1]
is again the position autocorrelation timescale and tau[2]
is the velocity autocorrelation timescale. tau[1]=Inf
then yields integrated Ornstein-Uhlenbeck (IOU) motion, which is a spatially unrestricted continuous-velocity process.
Two new models were introduced in ctmm version 0.5.2 for the case of tau[1]==tau[2]
, which can happen with short tracks of data. When tau[1]==tau[2]
and omega==0
, the model is categorized as OUf---a special case of OUF---and the two tau
parameters are treated as identical. On the other hand, when tau[1]==tau[2]
and omega>0
, an oscillatory model is implemented, which we refer to as OU\(\Omega\).
The potential fitting methods---maximum likelihood (ML
), residual maximum likelihood (REML
), perturbative REML (pREML
), hybrid REML (HREML
), and perturbative hybrid REML (pHREML
)---are described in Fleming et al (2019). In general, pHREML
is the best method, though when parameter estimates lie near boundaries it can fail, in which case ctmm.fit
will fall back to HREML
, as reported by the method
slot of the resulting fit object.
The control
list can take the following arguments, with defaults shown:
method="pNewton"
The partial-Newton method of optimizer
is default. See optim
for alternative methods in multiple dimensions.
precision=1/2
Fraction of machine numerical precision to target in the maximized likelihood value. MLEs will necessarily have half this precision. On most computers, precision=1
is approximately 16 decimal digits of precision for the likelihood and 8 for the MLEs.
maxit=.Machine$integer.max
Maximum number of iterations allowed for optimization.
Model selection in ctmm.select
proceeds in two phases. If there are a large number of parameters that must be fit numerically (such as when error is modeled), then the target model (argument CTMM
) is worked toward by first fitting simpler, compatible models. The second phase proceeds by attempting to simplify the autocorrelation model and complexify the deterministic (trend) model until the information criterion fails to improve. The intent of working in these directions is to improve numerical convergence and avoid fitting trends to autocorrelation.
Note that simpler models in a nested hierarchy will only be attempted if they appear credible, which can be adjusted with the level
argument. level=1
will, therefore, always attempt a simpler model.
The leave-one-out cross validation IC, IC="LOOCV"
, is (-2 times) the sum of log-likelihoods of the validation data, after fitting to and conditioning on the training data. This information criterion is intended for small amounts of data where AIC/BIC are not valid, and where the questions of interest are targeted at the finest scales of the data, such as speed or occurrence. Unlike other model-selection criteria, the computational complexity of LOOCV is \(O(n^2)\), which is very slow for sample sizes on the order of 10-100 thousand locations. Furthermore, as autocorrelation in the validation data is ignored, this information criterion is not valid for making inferences at scales coarser than the sampling interval, such as home range.
The half-sample cross validation IC, IC="HSCV"
, is (-2 times) the sum of log-likelihoods of the validation data, after fitting to and conditioning on the training data consisting of the first (and second) halves of the data when split temporally. This information criterion is intended for when few range crossings are observed and AIC/BIC may not be valid.
The function ctmm
returns a prototype ctmm
movement-model object.
By default, ctmm.loglike
returns the log-likelihood of the model CTMM
.
ctmm.fit
(and ctmm.loglike
with verbose=TRUE
) returns the estimated ctmm
movement-model object with all of the components of CTMM
plus the components listed below.
ctmm.select
returns the best model by default, or the sorted list of attempted models if verbose=TRUE
, with the best model being first in the list.
AICc
The approximate corrected Akaike information criterion for multivariate distributions with variable numbers of unknown mean and (structured) covariance parameters (Burnham & Anderson, Eq. 7.91). This formula is only exact for IID data.
loglike
The log-likelihood.
sigma
The maximum likelihood variance/covariance estimate (possibly debiased). For the endlessly diffusing BM and IOU processes, this is instead the diffusion rate estimate.
mu
The maximum likelihood stationary mean vector estimate.
COV.mu
The covariance matrix of the mu
estimate, assuming that the covariance estimate is correct.
DOF.mu
The effective number of degrees of freedom in the estimate of mu
, assuming that the autocorrelation model is correct. This can be much smaller than length(data$t)
if the data are autocorrelated.
COV
Covariance of the autocovariance parameter estimate vector c(sigma,tau,circle)
, as derived (asymptotically) from the hessian
of the log-likelihood function, and where sigma
is parameterized in terms of its largest variance major
, the ratio of the smallest to largest variance minor
, and angle
of orientation. Typically, sigma
's major
parameter is extremely correlated to tau[1]
, and sequential components of tau
are slightly correlated.
The warning "MLE is near a boundary or optim() failed" indicates that you should be using ctmm.select
rather than ctmm.fit
, because some features are not well supported by the data.
The warning "pREML failure: indefinite ML Hessian" is normal if some autocorrelation parameters cannot be well resolved.
K. P. Burnham, D. R. Anderson, ``Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach'', Springer, 2nd edition (2003).
C. H. Fleming, J. M. Calabrese, T. Mueller, K.A. Olson, P. Leimgruber, W. F. Fagan, ``From fine-scale foraging to home ranges: A semi-variance approach to identifying movement modes across spatiotemporal scales'', The American Naturalist, 183:5, E154-E167 (2014) doi:10.1086/675504 .
C. H. Fleming, Y. Subaşı, J. M. Calabrese, ``A maximum-entropy description of animal movement'', Physical Review E, 91, 032107 (2015) doi:10.1103/PhysRevE.91.032107 .
C. H. Fleming, D. Sheldon, E. Gurarie, W. F. Fagan, S. LaPoint, J. M. Calabrese, ``Kálmán filters for continuous-time movement models'', Ecological Informatics, 40, 8-21 (2017) doi:10.1016/j.ecoinf.2017.04.008 .
C. H. Fleming, M. J. Noonan, E. P. Medici, J. M. Calabrese, ``Overcoming the challenge of small effective sample sizes in home-range estimation'', Methods in Ecology and Evolution 10:10, 1679-1689 (2019) doi:10.1111/2041-210X.13270 .
The default optimization method in ctmm
v0.5.7 and above is optimizer
's "pNewton"
. Annecdotally, on these problems, optimizer
's pNewton
method generally outperforms optim
's "Nelder-Mead"
, which generally outperforms optim
's "BFGS"
and "L-BFGS-B"
methods. With default arguments, "pNewton"
is about half as fast as "Nelder-Mead"
, but is resolving about twice as much numerical precision by default.
The AICs/BICs of endlessly diffusing models like BM and IOU cannot be easily compared to the AICs/BICs of range resident models like bivariate Gaussian, OU, and OUF, as their joint likelihood functions are infinitely different. Endlessly diffusing models have to be conditioned off of an initial state, which we derive in ctmm
by taking the large range limit of a range-restricted process. I.e., BM is the limit OU(Inf
) and IOU(tau
) is the limit OUF(Inf
,tau
). Using comparable likelihood functions gives up statistical efficiency and the objective prior. Moreover, comparing conditional likelihoods---with the objective prior taken from the joint likelihood---does not appear to select the true model with a likelihood ratio test. Different criteria must be used to select between range resident and endlessly diffusing movement models.
Prior to v0.3.6, the univariate AICc formula was (mis)used, with the full parameter count treated as degrees of freedom in the mean. As of v.0.3.6, the mean and autocovariance parameters are treated separately in the approximate multivariate AICc formula (Burnham & Anderson, Eq. 7.91). Still, this improved formula is only exact for IID data.
Prior to v0.3.2, ctmm.select
would consider every possible model.
This is no longer feasible with current versions of ctmm
, as the number of possible models has grown too large.
# \donttest{
# Load package and data
library(ctmm)
data(buffalo)
DATA <- buffalo$Cilla
GUESS <- ctmm.guess(DATA,interactive=FALSE)
# in general, you want to run ctmm.select instead
FIT <- ctmm.fit(DATA,GUESS)
# some human-readable information
summary(FIT)
#> $name
#> [1] "OUF anisotropic"
#>
#> $DOF
#> mean area diffusion speed
#> 10.73355 18.13604 902.23378 3445.13305
#>
#> $CI
#> low est high
#> area (square kilometers) 239.647430 403.458423 609.240189
#> τ[position] (days) 4.438957 7.505360 12.690014
#> τ[velocity] (minutes) 39.607116 42.069009 44.683929
#> speed (kilometers/day) 13.820458 14.055146 14.289780
#> diffusion (square kilometers/day) 5.284545 5.647059 6.021428
#>
# }