`ctmm.fit.Rd`

These functions allow one to propose hypothetical movement models (with initial estimates), fit those models to the data, and select among those models via an information criterion.
The fitting functions wrap around `optim`

and `ctmm.loglike`

to fit continuous-time movement models to 2D animal tracking data as described in Fleming et al (2014) and Fleming et al (2015), and Fleming et al (2017).

ctmm(tau=NULL,omega=FALSE,isotropic=FALSE,range=TRUE,circle=FALSE,error=FALSE, axes=c("x","y"),...) ctmm.loglike(data,CTMM,REML=FALSE,profile=TRUE,zero=0,verbose=FALSE) ctmm.fit(data,CTMM=ctmm(),method="pHREML",COV=TRUE,control=list(),trace=FALSE) ctmm.select(data,CTMM,verbose=FALSE,level=1,IC="AICc",MSPE="position",trace=FALSE,cores=1, ...)

tau | Array of autocorrelation timescales (explained below). |
---|---|

omega | Frequency (\(2\pi/period\)) of oscillatory range crossings. |

isotropic | A Boolean denoting whether or not the animal's covariance is circular or elliptical. |

range | A Boolean denoting whether or not the movement model has a finite range. |

circle | (\(2\pi\) divided by) the period it takes the animal to stochastically circle its mean location. |

error | A Boolean denoting whether or not to use annotated telemetry error estimates or an estimate of the error's standard deviation if the data are not annotated with error estimates or when \(HDOP=1\). |

axes | Spatial dimensions of the movement model. |

data | Timeseries data represented as a |

CTMM | A |

REML | Use residual maximum likelihood if |

profile | Analytically solve for as many covariance parameters as possible. |

zero | Calculates \(log(likelihood) - zero\), instead of just \(log(likelihood)\), in a way that maintains numerical precision if the constant |

verbose | Return additional information. See "Value" below. |

method | Fitting method to use: |

COV | Estimate the autocorrelation parameter covariance matrix. |

control | An optional argument list for the optimizer. |

trace | Report progress updates. Can be among |

level | Attempt to simplify a model if a feature's non-existence falls within this level of confidence interval. |

IC | Information criterion used for selection. Can be |

MSPE | Reject non-stationary features that increase the mean square predictive error of |

cores | Maximum number of models to fit in parallel. cores=0 will use all cores, while cores<0 will reserve abs(cores). |

... | Further arguments passed to |

Model fitting and selection first requires a prototype model with guesstimated parameters (i.e., Brownian motion with a particular diffusion rate).
The initial `ctmm`

parameter guess can be generated by the output of `ctmm.guess`

, `variogram.fit`

or manually specified with the function `ctmm(...)`

, where the argument `tau`

is explained below and additional model options described in `vignette("ctmm")`

.

By default, `tau`

(\(\tau\)) is an ordered array of autocorrelation timescales.
If `length(tau)==0`

, then an IID bi-variate Gaussian model is fit to the data.
If `length(tau)==1`

, then an Ornstein-Uhlenbeck (OU) model (Brownian motion restricted to a finite home range) is fit the data, where `tau`

is the position autocorrelation timescale. `tau=Inf`

then yields Brownian motion (BM).
If `length(tau)==2`

, then the OUF model (continuous-velocity motion restricted to a finite home range) is fit to the data, where `tau[1]`

is again the position autocorrelation timescale and `tau[2]`

is the velocity autocorrelation timescale. `tau[1]=Inf`

then yields integrated Ornstein-Uhlenbeck (IOU) motion, which is a spatially unrestricted continuous-velocity process.

Two new models were introduced in ctmm version 0.5.2 for the case of `tau[1]==tau[2]`

, which can happen with short tracks of data. When `tau[1]==tau[2]`

and `omega==0`

, the model is categorized as OUf---a special case of OUF---and the two `tau`

parameters are treated as identical. On the other hand, when `tau[1]==tau[2]`

and `omega>0`

, an oscillatory model is implemented, which we refer to as OU\(\Omega\).

The potential fitting methods---maximum likelihood (`ML`

), residual maximum likelihood (`REML`

), perturbative REML (`pREML`

), hybrid REML (`HREML`

), and perturbative hybrid REML (`pHREML`

)---are described in Fleming et al (2019). In general, `pHREML`

is the best method, though when parameter estimates lie near boundaries it can fail, in which case `ctmm.fit`

will fall back to `HREML`

, as reported by the `method`

slot of the resulting fit object.

The `control`

list can take the following arguments, with defaults shown:

`method="pNewton"`

The partial-Newton method of

`optimizer`

is default. See`optim`

for alternative methods in multiple dimensions.`precision=1/2`

Fraction of machine numerical precision to target in the maximized likelihood value. MLEs will necessarily have half this precision. On most computers,

`precision=1`

is approximately 16 decimal digits of precision for the likelihood and 8 for the MLEs.`maxit=.Machine$integer.max`

Maximum number of iterations allowed for optimization.

Model selection in `ctmm.select`

proceeds in two phases. If there are a large number of parameters that must be fit numerically (such as when error is modeled), then the target model (argument `CTMM`

) is worked toward by first fitting simpler, compatible models. The second phase proceeds by attempting to simplify the autocorrelation model and complexify the deterministic (trend) model until the information criterion fails to improve. The intent of working in these directions is to improve numerical convergence and avoid fitting trends to autocorrelation.
Note that simpler models in a nested hierarchy will only be attempted if they appear credible, which can be adjusted with the `level`

argument. `level=1`

will, therefore, always attempt a simpler model.

The leave-one-out cross validation IC, `IC="LOOCV"`

, is (-2 times) the sum of log-likelihoods of the validation data, after fitting to and conditioning on the training data. This information criterion is intended for small amounts of data where AIC/BIC are not valid, and where the questions of interest are targeted at the finest scales of the data, such as speed or occurrence. Unlike other model-selection criteria, the computational complexity of LOOCV is \(O(n^2)\), which is very slow for sample sizes on the order of 10-100 thousand locations. Furthermore, as autocorrelation in the validation data is ignored, this information criterion is not valid for making inferences at scales coarser than the sampling interval, such as home range.

The half-sample cross validation IC, `IC="HSCV"`

, is (-2 times) the sum of log-likelihoods of the validation data, after fitting to and conditioning on the training data consisting of the first (and second) halves of the data when split temporally. This information criterion is intended for when few range crossings are observed and AIC/BIC may not be valid.

The function `ctmm`

returns a prototype `ctmm`

movement-model object.
By default, `ctmm.loglike`

returns the log-likelihood of the model `CTMM`

.
`ctmm.fit`

(and `ctmm.loglike`

with `verbose=TRUE`

) returns the estimated `ctmm`

movement-model object with all of the components of `CTMM`

plus the components listed below.
`ctmm.select`

returns the best model by default, or the sorted list of attempted models if `verbose=TRUE`

, with the best model being first in the list.

`AICc`

The approximate corrected Akaike information criterion for multivariate distributions with variable numbers of unknown mean and (structured) covariance parameters (Burnham & Anderson, Eq. 7.91). This formula is only exact for IID data.

`loglike`

The log-likelihood.

`sigma`

The maximum likelihood variance/covariance estimate (possibly debiased). For the endlessly diffusing BM and IOU processes, this is instead the diffusion rate estimate.

`mu`

The maximum likelihood stationary mean vector estimate.

`COV.mu`

The covariance matrix of the

`mu`

estimate, assuming that the covariance estimate is correct.`DOF.mu`

The effective number of degrees of freedom in the estimate of

`mu`

, assuming that the autocorrelation model is correct. This can be much smaller than`length(data$t)`

if the data are autocorrelated.`COV`

Covariance of the autocovariance parameter estimate vector

`c(sigma,tau,circle)`

, as derived (asymptotically) from the`hessian`

of the log-likelihood function, and where`sigma`

is parameterized in terms of its largest variance`major`

, the ratio of the smallest to largest variance`minor`

, and`angle`

of orientation. Typically,`sigma`

's`major`

parameter is extremely correlated to`tau[1]`

, and sequential components of`tau`

are slightly correlated.

The warning "MLE is near a boundary or optim() failed" indicates that you should be using `ctmm.select`

rather than `ctmm.fit`

, because some features are not well supported by the data.

The warning "pREML failure: indefinite ML Hessian" is normal if some autocorrelation parameters cannot be well resolved.

K. P. Burnham, D. R. Anderson, ``Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach'', Springer, 2nd edition (2003).

C. H. Fleming, J. M. Calabrese, T. Mueller, K.A. Olson, P. Leimgruber, W. F. Fagan, ``From fine-scale foraging to home ranges: A semi-variance approach to identifying movement modes across spatiotemporal scales'', The American Naturalist, 183:5, E154-E167 (2014) doi: 10.1086/675504 .

C. H. Fleming, Y. Subaşı, J. M. Calabrese, ``A maximum-entropy description of animal movement'', Physical Review E, 91, 032107 (2015) doi: 10.1103/PhysRevE.91.032107 .

C. H. Fleming, D. Sheldon, E. Gurarie, W. F. Fagan, S. LaPoint, J. M. Calabrese, ``Kálmán filters for continuous-time movement models'', Ecological Informatics, 40, 8-21 (2017) doi: 10.1016/j.ecoinf.2017.04.008 .

C. H. Fleming, M. J. Noonan, E. P. Medici, J. M. Calabrese, ``Overcoming the challenge of small effective sample sizes in home-range estimation'', Methods in Ecology and Evolution 10:10, 1679-1689 (2019) doi: 10.1111/2041-210X.13270 .

C. H. Fleming and G. Péron.

The default optimization method in `ctmm`

v0.5.7 and above is `optimizer`

's `"pNewton"`

. Annecdotally, on these problems, `optimizer`

's `pNewton`

method generally outperforms `optim`

's `"Nelder-Mead"`

, which generally outperforms `optim`

's `"BFGS"`

and `"L-BFGS-B"`

methods. With default arguments, `"pNewton"`

is about half as fast as `"Nelder-Mead"`

, but is resolving about twice as much numerical precision by default.

The AICs/BICs of endlessly diffusing models like BM and IOU cannot be easily compared to the AICs/BICs of range resident models like bivariate Gaussian, OU, and OUF, as their joint likelihood functions are infinitely different. Endlessly diffusing models have to be conditioned off of an initial state, which we derive in `ctmm`

by taking the large range limit of a range-restricted process. I.e., BM is the limit OU(`Inf`

) and IOU(`tau`

) is the limit OUF(`Inf`

,`tau`

). Using comparable likelihood functions gives up statistical efficiency and the objective prior. Moreover, comparing conditional likelihoods---with the objective prior taken from the joint likelihood---does not appear to select the true model with a likelihood ratio test. Different criteria must be used to select between range resident and endlessly diffusing movement models.

Prior to v0.3.6, the univariate AICc formula was (mis)used, with the full parameter count treated as degrees of freedom in the mean. As of v.0.3.6, the mean and autocovariance parameters are treated separately in the approximate multivariate AICc formula (Burnham & Anderson, Eq. 7.91). Still, this improved formula is only exact for IID data.

Prior to v0.3.2, `ctmm.select`

would consider every possible model.
This is no longer feasible with current versions of `ctmm`

, as the number of possible models has grown too large.

# \donttest{ # Load package and data library(ctmm) data(buffalo) Cilla <- buffalo$Cilla GUESS <- ctmm.guess(Cilla,interactive=FALSE) FIT <- ctmm.fit(Cilla,GUESS) # some human-readable information summary(FIT)#> $name #> [1] "OUF anisotropic" #> #> $DOF #> mean area speed #> 10.73354 18.13596 3445.13306 #> #> $CI #> low est high #> area (square kilometers) 239.647382 403.458821 609.24124 #> τ[position] (days) 4.026011 7.505367 13.99165 #> τ[velocity] (minutes) 39.570325 42.069009 44.72547 #> speed (kilometers/day) 13.819957 14.054636 14.28926 #># }