Fit species distribution models (SDMs) [IN DEVELOPMENT]

This function fits species distribution models, sampling density models, and integrated SDMs.

sdm.fit(data,R=list(),formula=NULL,area=NULL,reference="auto",standardize=TRUE,
        integrator="MonteCarlo",error=0.01,max.mem="1 Gb",interpolate=TRUE,trace=TRUE,...)

sdm.select(data,R=list(),formula=NULL,area=NULL,verbose=FALSE,IC="AICc",trace=TRUE,...)

sdm.integrate(biased=NULL,bias=NULL,unbiased=NULL)

Arguments

data: A telemetry object.
R: A named list of rasters or time-varying raster stacks [NOT TESTED] to fit Poisson regression coefficients to (under a log link).
formula: Formula object for \(\log(\lambda)\) referencing the elements of R and columns of data (see Details below). If not specified, a linear term will be included for every element of R.
area: A spatial polygon object defining the extent of the SDM. If left NULL, an integrated Gaussian model will be used to define the extent of the SDM, which can be a very bad model for geographic ranges.
reference: When expanding categorical predictors into indicator variables, reference="auto" will choose the most common predictor to be the reference category. Otherwise, the reference category can be specified by this argument.
standardize: For numerical stability, predictors are internally standardized, if rescale=TRUE and no formula is specified. (The final outputs are not standardized.) Otherwise, users are responsible for standardizing their predictors.
integrator: Numerical integrator used for likelihood evaluation. Can be "MonteCarlo" or "Riemann" (IN TESTING).
error: Relative numerical error threshold for the parameter estimates and log-likelihood.
max.mem: Maximum amount of memory to allocate for availability sampling.
interpolate: Whether or not to interpolate raster values during extraction.
trace: Report progress on convergence (see Details).
verbose: Returns all candidate models if TRUE. Otherwise, only the IC-best model is returned.
IC: Model selection criterion. Can be AIC, AICc, or BIC.
...: Arguments passed to rsf.fit or optimizer.
biased: A biased SDM calculated from occurrence records with non-uniform sampling density.
bias: An ``SDM'' calculated from data representative of the above sampling density.
unbiased: An unbiased SDM or list of RSFs.

Details

Instead of specifying a number of ``available'' points to sample and having an unknown amount of numerical error to contend with, rsf.fit specifies an estimation target error and the number of ``available'' points is increased until this target is met. Moreover, the output log-likelihood is that of the continuous Poisson point process, which does not depend on the number of ``available'' points that were sampled, though the numerical variance estimate is recorded in the VAR.loglike slot of the fit object.

When trace=TRUE, a number of convergence estimates are reported, including the standard deviation of the numerical error of the log-likelihood, SD[\(\log(\ell)\)], the most recent log-likelihood update, d\(\log(\ell)\), and the most recent (relative) parameter estimate updates d\(\hat{\beta}/\)SD[\(\hat{\beta}\)].

The formula object determines \(\log(\lambda)\) and can reference static rasters in R, time-dependent raster stacks in R [NOT TESTED], and time-dependent effect modifiers in the columns of data, such as provided by annotate. Any offset terms are applied under a log transformation (or multiplicatively to \(\lambda\)), and can be used to enforce hard boundaries, where offset(raster)=TRUE denotes accesible points and offset(raster)=FALSE denotes inaccessible points [NOT TESTED]. Intercept terms are ignored, as they generally do not make sense for individual Poisson point process models. This includes terms only involving the columns of data, as they lack spatial dependence.

Categorical raster variables are expanded into indicator variables, according to the reference category argument. Upon import via raster, categorical variables may need to be assigned with as.factor, or else they may be interpreted as numerical variables.

References

J. M. Alston, C. H. Fleming, R. Kays, J. P. Streicher, C. T. Downs, T. Ramesh, B. Reineking, & J. M. Calabrese, ``Mitigating pseudoreplication and bias in resource selection functions with autocorrelation-informed weighting'', Methods in Ecology and Evolution 14:2 643--654 (2023) doi:10.1111/2041-210X.14025 .

Author

C. H. Fleming

Note

It is much faster to calculate all predictors ahead of time and specifying them in the R list than to reference then in the formula argument, which will calculate them as needed, saving memory.

AIC and BIC values for integrated=FALSE models do not include any penalty for the estimated location and shape of the available area, and so their AIC and BIC values are expected to be worse than reported.