occupancyTuts: Occupancy Modeling Tutorials with RPresence
Abstract
The occupancy modeling framework offers tremendous flexibility in estimating species abundance and distribution distribution patterns while accounting for imperfect detection, and has seen rapid growth and adoption since its introduction at the beginning of the century.
At the same time, in an era of big data, there are increasing demands on developing quantitative skills and proficiency in young ecologists, many of whom lack the quantitative training needed to conduct research professionally.
We introduce occupancyTuts, an R package that features 28 learnr tutorials that teach the statistical underpinnings of several occupancy models. The tutorials include written content, instructional videos, R exercises, and quiz elements, covering a range of topics including statistical underpinnings, single- and dynamic-occupancy models, study design, and several of the “spin-off” models that extend the basic framework.
We plan for development of new tutorials that use RPresence as the analysis engine, and welcome new tutorial contributions that use other R packages as the analysis engine.
KEYWORDS
Occupancy modeling, hierarchical modeling, species distribution modeling, RPresence, learnr
1. INTRODUCTION
Understanding how species are distributed in both space and time are central questions in ecology. Abundance, distribution, and species richness patterns are “state” variables that describe an ecological system of interest (Figure 1). These state variables are often unknown but are, for any number of reasons, of interest to ecologists.
However, the true state of an ecological system is imperfectly observed by humans because survey methods are imperfect. That is, errors can arise when conducting surveys, such as missing a species that is actually present (a false negative), or mistaking one species for another (a false positive). Such errors can lead to parameter estimates that are significantly biased, hindering progress in characterizing ecological systems and informing management decisions.
Occupancy modeling is the de facto method for estimating state patterns while correcting for imperfect detection (MacKenzie et al. 2018a). The approach is part of a broader class of models in ecology known as “hierarchical models,” where the analytic approach separates the estimation of the state (abundance, distribution, or richness) from the imperfect observation of that state.
The seminal single season occupancy model (MacKenzie et al. 2002) considers monitoring a target species at N sites, on which the target is present on a portion of them. Monitoring is replicated at each site T times, generating “encounter histories” that consist of a vector of 0’s and 1’s denoting detection and nondetection, respectively, for each occasion on which the site was surveyed. For example, for a study in which 3 surveys are conducted at each site, a 010 history at a given site indicates that the species was not detected on the first survey, was detected on the second, and not detected on the third. From the pattern of 0’s and 1’s recorded in the field, the approach uses maximum likelihood methods to estimate the key parameters of interest: \(\psi\) (the probability of occupancy) and \(p_i\), the probability of detecting a species on survey i given presence. Importantly, these parameters may be functions of covariates that influence occupancy and detection. In the most general case, the combined likelihood is:
\[ L(\psi, p_i) = \biggl[\psi^{n.}\prod_{t=1}^T p_t^{n_t} (1-p_t)^{n.-n_t}\biggr] \times \biggl[ \psi \prod_{t=1}^T (1-p_t) + (1 - \psi)\biggr] ^{N-n.} \]
As the name implies, occupancy modeling typically focuses on uncovering species distribution patterns (presence-absence). However, several models allow estimation of abundance and species richness as well. Since the seminal paper by MacKenzie et al. (2002), there has been an explosion in the development and application of models aimed at estimating species occurrence and occupancy dynamics while accounting imperfect detection (Bailey et al. 2013). A recent Web of Science search on the terms “occupancy modeling” yielded >6600 peer reviewed journal articles and >7000 online resources (accessed 2023-06-27). These contributions include models that account for multiple seasons (MacKenzie et al. 2003), multiple occupancy states (MacKenzie et al. 2018b), multi-scale occupancy patterns (Nichols et al. 2008), species co-occurrence (MacKenzie et al. 2018d), and community level patterns (MacKenzie et al. 2018c), among others. Many extensions focus on the detection process, including species misidentification (Royle & Link 2006; Miller et al. 2011), correlated detections (Hines et al. 2009), and the use of multiple survey methods (Nichols et al. 2008). An impressive span of research topics feature occupancy modeling at their core, including .
One of the primary R packages for analyzing occupancy data is RPresence (MacKenzie & Hines 2023), which incorporates code from two additional software programs: (1) Program Presence, a Windows©-based program with a graphical user-interface (GUI) and program called GENPRES, which generates simulated data for occupancy analysis. It can be used as a design tool to determine the effort (number of sites and/or surveys) required to estimate occupancy or detection parameters to a certain level of precision. RPresence was developed to analyze occupancy data with R (R Core Team 2012) instead of the Presence GUI. This package enables users to gather and edit data using R and run models with a function call and formulae for parameters. In other words, R is the “front end” for sending analyses to Presence or GENPRES, which does the actual work and then returns the output to R.
Additional software and R packages are available for occupancy analysis as well. For example, the packages unmarked (Fiske & Chandler 2011), spOccupancy (Doser et al. 2022), and ubms (Kellner et al. 2021) provide tools for modeling species distribution and abundance patterns while accounting for measurement error. RMark (Laake 2013) is another popular offering that uses R as a front end to the software MARK (White & Burnham 2023), which provides tools for analyzing distribution and capture-mark-recapture data.
Paired with the rapid rise in hierarchical modeling is a lack of quantitative training among early-career ecologists (Barraquand et al. 2014), potentially due to a lack of quantitative training (mathematics, statistics, and programming) at the graduate and undergraduate level (Cuddington et al. 2023). Surveys suggest that an astounding 75% of early career scientists in ecology do not feel satisfied with their understanding of models that are relevant to their own field of interest (Barraquand et al. 2014). The call for increased quantitative training has been echoed since the turn of the century (Anderson et al. 2003), growing louder as rapid developments in the sciences demand the use of advanced quantitative methods. Barraquand et al. (2014) emphasize “With the increase in availability of advanced methods, quantitative training ought to focus on (i) understanding how these methods work and (ii) when to use them.” We suggest adding (iii) “how to use them” in light of recent calls to advance programming skills in students at any stage of their career (X Feng & Enquist 2020; Juavinett 2022).
In this article, we introduce occupancyTuts, an R package featuring 28 learnr tutorials (Aden-Buie et al. 2023) that guide users through the theory and analysis of occupancy data with the package RPresence. The tutorials include written, instructional videos, R exercises, interactive Shiny components (Chang et al. 2022), and quiz elements that roughly accompany the book, “Occupancy Estimation and Modeling” and the published papers that make up its foundation (MacKenzie et al. 2018a). In developing these tutorials, our aim was to provide hands-on quantitative training and show users how to run occupancy analyses in R on their computer.
2. THE occupancyTuts PACKAGE
occupancyTuts provides background and instructions for occupancy analysis with RPresence. The package is available on the CRAN repository.
install.packages("occupancyTuts")
The canonical home of occupancyTuts is https://code.usgs.gov/vtcfwru/occupancyTuts/, where users can post issues, create merge requests, and download beta versions that are in development. Once installed, learnr’s available_tutorials()
function can be used to display a list of tutorials:
::available_tutorials(package = "occupancyTuts")[,2:4] learnr
name | title | description |
---|---|---|
binomial | occupancyTuts: Binomial Probability | An introduction to the binomial probability mass function, the binomial distribution, and binomial likelihood. |
binomialR | occupancyTuts: Binomial Probability Functions in R | An introduction to the family of binomial functions in R. |
design_matrices | occupancyTuts: Design Matrices in RPresence | An introduction to design matrices in RPresence. |
eh | occupancyTuts: Encounter Histories | An introduction to documenting survey results as encounter histories. |
gof | occupancyTuts: Goodness of fit test | An introduction to goodness of fit and how it is implemented in RPresence. |
intro | occupancyTuts: Introduction | This is the first tutorial to complete. It introduces what a learnr tutorial is and gives an overview of the available tutorials in the R package, occupancyTuts. |
model_selection | occupancyTuts: Model Selection | Model Selection tutorial. |
ms_false_positive | occupancyTuts: Multi-season Occupancy with false-positive detections | Multi-season occupancy with false positive detections tutorial. |
ms_multi_state | occupancyTuts: Multi-season, multi-state Occupancy in RPresence | Multi-season multi-state patch occupancy tutorial. |
ms_species_richness | occupancyTuts: Multi-season Species Richness Occupancy in RPresence | Multi-season species richness occupancy tutorial. |
multi_season | occupancyTuts: Multi-season Occupancy in RPresence | Multi-season patch occupancy tutorial. |
multinomial | occupancyTuts: Multinomial Likelihood | An introduction to the multinomial probability mass function, the multinomial distribution, and multinomial likelihood. |
multinomialR | occupancyTuts: Multinomial Probability Functions in R | An introduction to the family of multinomial functions in R. |
optimization | occupancyTuts: Optimization | An introduction to computer optimization procedures. |
single_season | occupancyTuts: Single Season Occupancy in RPresence | Single season patch occupancy tutorial with no covariates. |
sitecovs | occupancyTuts: Site-level Covariates | Analyzing occupancy data with site covariates. |
software | occupancyTuts: Occupancy Software | An introduction to primary software used in this package, RPresence. |
spatials | occupancyTuts: Incorporating Spatial Data | Working with spatial data. |
ss_corr_det | occupancyTuts: Single-season, Correlated Detections | Single-season correlated detections occupancy model |
ss_false_pos | occupancyTuts: Single Season Model With Identification Errors | Single-season model with identification errors tutorial. |
ss_mixture | occupancyTuts: Single Season Model With Heterogeneous Detections | Single-season model with heterogeneous detections. |
ss_multi_method | occupancyTuts: Single-season, multi-method model | Single-season multi-method occupancy model |
ss_multistate | occupancyTuts: Single Season, Multi-state Model | Single-season model with multiple states tutorial. |
ss_species_richness | occupancyTuts: Species-richness Occupancy | Species richness using occupancy models |
ss_two_species | occupancyTuts: Single-season, Two-species Occupancy | Single-season two-species interaction occupancy model |
study_design | occupancyTuts: Occupancy Study Design | An introduction to how to design an occupancy study for a target species. |
surveycovs | occupancyTuts: Survey-level Covariates | Analyzing occupancy data with survey covariates. |
wrangling | occupancyTuts: Data Wrangling | An introduction to common data wrangling techniques for occupancy analysis |
As shown, occupancyTuts includes 28 tutorials, roughly grouped into the following categories:
Background and statistical theory - tutorials that introduce learnr tutorials and develop proficiency in probability mass functions (pmf) and likelihood, especially the binomial and multinomial pmf’s, which provide the statistical machinery behind many occupancy modeling approaches.
Software - tutorials that introduce RPresence and discuss general optimization methods for finding maximum likelihood estimates.
Single season occupancy models - tutorials centered around the seminal single season occupancy paper (MacKenzie et al. 2002), how to wrangle data and include site and survey covariates into models, and how to evaluate different models with goodness of fit and model selection methods.
Study design - tutorials that teach how to design an occupancy study for a target species in terms of identifying the number of study sites and the number of repeat surveys needed to maximize precision (Bailey et al. 2007).
Multi-season or dynamic occupancy models - tutorial that introduces the multi-season or dynamic occupancy model in which occupancy state changes through time (MacKenzie et al. 2003). This model is very popular for monitoring status and trend through time while accounting for errors in detection.
Spin-off models - tutorials that introduce a host of extensions to the single season and multi-season occupancy models, including multi-state, multi-method, and multi-species models.
Tutorials can be accessed via the “Tutorial” tab in RStudio, or can be launched via code. For example, the following code will launch the tutorial that introduces the single season occupancy model:
::run_tutorial(
learnrname = "single_season",
package = "occupancyTuts"
)
The run_tutorial()
function launches the tutorial in the user’s web browser, as shown in Figure 2. When launched, R is running an RShiny (Chang et al. 2022) application that “listens” to commands or entries made within the tutorial itself, and will respond when called.
Each tutorial is divided into topics, which can be seen in the left menu. Each topic may contain written, instructional videos, R exercises, interactive Shiny widgets, and quiz elements. For any given tutorial, the first topic is “Prerequisites”, which identifies the preceeding required tutorial and also provides a list of suggested or potential readings. Progression can be saved, allowing users to close out when needed and returned to at a later time.
The topics are consistent among tutorials in that they begin by providing a motivating background, followed by objectives, and then guide users through an analysis step-by-step. In an attempt to de-mystify how the field data (typically a pattern of detections and non-detections) translate into parameter estimates, many of the tutorials include videos that illustrate the nuts and bolts of the analysis in a simple spreadsheet environment Completed tutorials can be printed as a PDF for future reference.
The final topic of any tutorial is “What’s next?” It features the tutPrePost()
function that generates a dataframe that shows tutorial follow-ups. Follow-ups are coded as 1 or 2, where 1 indicates a tutorial that may be of interest (an FYI) and a 2 indicates a suggested follow-up. For example, the code below returns the tutorials that users may be interested in after completing the “single_season” tutorial:
::tutPrePost(tut = "single_season", type = "post") occupancyTuts
tut | follow_up | description |
---|---|---|
gof | 2 | An introduction to goodness of fit and how it is implemented in *RPresence*. |
optimization | 2 | An introduction to computer optimization procedures. |
sitecovs | 2 | Analyzing occupancy data with site covariates. |
spatials | 2 | Working with spatial data. |
ss_corr_det | 1 | Single-season correlated detections occupancy model |
ss_false_pos | 1 | Single-season model with identification errors tutorial. |
ss_mixture | 1 | Single-season model with heterogeneous detections. |
ss_multi_method | 1 | Single-season multi-method occupancy model |
ss_multistate | 1 | Single-season model with multiple states tutorial. |
ss_species_richness | 1 | Species richness using occupancy models |
ss_two_species | 1 | Single-season two-species interaction occupancy model |
study_design | 1 | An introduction to how to design an occupancy study for a target species. |
surveycovs | 1 | Analyzing occupancy data with survey covariates. |
wrangling | 2 | An introduction to common data wrangling techniques for occupancy analysis |
As shown, the suggested follow-up tutorials to the single-season occupancy model include the topics of goodness of fit (determining if an occupancy model “fits” the observed field data), optimization (understanding how RPresence finds the maximum likelihood estimates and their precision), and learning how to wrangle data so that site-level or survey-level covariates can be included in the analysis. The “FYI” tutorials include the many spin-offs of the single season occupancy model, such as including false positive survey results (“ss_false_pos”). If occupancyTuts is used in a classroom or workshop, the instructor should identify the workflow.
3. BETA TESTING
occupancyTuts served as the basis for a course in Occupancy Modeling at the University of Vermont and the National Conservation Training Center in 2022 and 2023. Overall, students were very receptive to the teaching format and were able to run complex single- or multi-season models featuring covariates, model selection, and goodness of fit analysis, ultimately producing written summaries that would be used in a thesis or dissertation chapter. Students also provided suggestions for tutorial improvement.
4. SUMMARY AND DISCUSSION
The R package, occupancyTuts, provides a new entry into teaching quantitative methods to students of ecology. As an open-access contribution available to anyone with access to a computer and internet for download, occupancyTuts provides one option for increasing quantitative training opportunities for students that can be used in both in-person and on-line classrooms (Touchon & McCoy 2016; Bachner & O’Bryrne 2019), workshops (LaTourrette et al. 2021), clubs (Johnston et al. 2019; Hagan 2020), or as an independent study. The use of learnr as a teaching modem builds not only background in ecology theory and statistical underpinnings, but also builds confidence in coding (Juavinett 2022) and performance (Freeman et al. 2014).
We plan for development of new tutorials that use RPresence as the analysis engine. In progress tutorials can be downloaded with the following code:
::install_gitlab(
remotesrepo = "vtcfwru/occupancyTuts/",
host = "code.usgs.gov",
dependencies = TRUE)
The master branch can also be manually downloaded and installed from .zip files (Windows users) and tar.gz files (Mac and Linux users) from:
https://code.usgs.gov/vtcfwru/-/archive/master/occupancyTuts-master.zip
https://code.usgs.gov/vtcfwru/-/archive/master/occupancyTuts-master.tar.gz
We welcome new tutorial contributions that use other R packages as the analysis engine. Such contributions can make use of the background content provided by occupancyTuts, but provide alternative guidance for running models in a package of choice.
Acknowledgments
We thank Shawn Haskell, Alexej Siren, and the many enthusiastic students enrolled in Occupancy Modeling at the University of Vermont and the National Conservation Training Center for feedback on the package tutorials. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. The Vermont Cooperative Fish and Wildlife Research Unit is jointly supported by the U.S. Geological Survey, University of Vermont, Vermont Fish and Wildlife Department, and Wildlife Management Institute.