occupancyTuts: Occupancy Modeling Tutorials with RPresence

Authors
Affiliations

Therese Donovan

U.S. Geological Survey, Vermont Cooperative Fish and Wildlife Research Unit

James Hines

U.S. Geological Survey, Eastern Ecological Science Center

Darryl MacKenzie

Proteus Consulting

Abstract

  1. The occupancy modeling framework offers tremendous flexibility in estimating species abundance and distribution distribution patterns while accounting for imperfect detection, and has seen rapid growth and adoption since its introduction at the beginning of the century.

  2. At the same time, in an era of big data, there are increasing demands on developing quantitative skills and proficiency in young ecologists, many of whom lack the quantitative training needed to conduct research professionally.

  3. We introduce occupancyTuts, an R package that features 28 learnr tutorials that teach the statistical underpinnings of several occupancy models. The tutorials include written content, instructional videos, R exercises, and quiz elements, covering a range of topics including statistical underpinnings, single- and dynamic-occupancy models, study design, and several of the “spin-off” models that extend the basic framework.

  4. We plan for development of new tutorials that use RPresence as the analysis engine, and welcome new tutorial contributions that use other R packages as the analysis engine.

KEYWORDS

Occupancy modeling, hierarchical modeling, species distribution modeling, RPresence, learnr

1. INTRODUCTION

Understanding how species are distributed in both space and time are central questions in ecology. Abundance, distribution, and species richness patterns are “state” variables that describe an ecological system of interest (Figure 1). These state variables are often unknown but are, for any number of reasons, of interest to ecologists.

Figure 1. The difference between abundance and distribution is highlighted is the figure below. The dots represent individuals of a target species distributed in space. Imposed on this point distribution is a grid. In the left panel, the labels provide the abundance of individuals per grid cell, while right panel’s labels provide the distribution of the target species in terms of presence (1) or absence (0). The species richness state variable aggregates distribution across multiple species.

However, the true state of an ecological system is imperfectly observed by humans because survey methods are imperfect. That is, errors can arise when conducting surveys, such as missing a species that is actually present (a false negative), or mistaking one species for another (a false positive). Such errors can lead to parameter estimates that are significantly biased, hindering progress in characterizing ecological systems and informing management decisions.

Occupancy modeling is the de facto method for estimating state patterns while correcting for imperfect detection (MacKenzie et al. 2018a). The approach is part of a broader class of models in ecology known as “hierarchical models,” where the analytic approach separates the estimation of the state (abundance, distribution, or richness) from the imperfect observation of that state.

The seminal single season occupancy model (MacKenzie et al. 2002) considers monitoring a target species at N sites, on which the target is present on a portion of them. Monitoring is replicated at each site T times, generating “encounter histories” that consist of a vector of 0’s and 1’s denoting detection and nondetection, respectively, for each occasion on which the site was surveyed. For example, for a study in which 3 surveys are conducted at each site, a 010 history at a given site indicates that the species was not detected on the first survey, was detected on the second, and not detected on the third. From the pattern of 0’s and 1’s recorded in the field, the approach uses maximum likelihood methods to estimate the key parameters of interest: \(\psi\) (the probability of occupancy) and \(p_i\), the probability of detecting a species on survey i given presence. Importantly, these parameters may be functions of covariates that influence occupancy and detection. In the most general case, the combined likelihood is:

\[ L(\psi, p_i) = \biggl[\psi^{n.}\prod_{t=1}^T p_t^{n_t} (1-p_t)^{n.-n_t}\biggr] \times \biggl[ \psi \prod_{t=1}^T (1-p_t) + (1 - \psi)\biggr] ^{N-n.} \]

As the name implies, occupancy modeling typically focuses on uncovering species distribution patterns (presence-absence). However, several models allow estimation of abundance and species richness as well. Since the seminal paper by MacKenzie et al. (2002), there has been an explosion in the development and application of models aimed at estimating species occurrence and occupancy dynamics while accounting imperfect detection (Bailey et al. 2013). A recent Web of Science search on the terms “occupancy modeling” yielded >6600 peer reviewed journal articles and >7000 online resources (accessed 2023-06-27). These contributions include models that account for multiple seasons (MacKenzie et al. 2003), multiple occupancy states (MacKenzie et al. 2018b), multi-scale occupancy patterns (Nichols et al. 2008), species co-occurrence (MacKenzie et al. 2018d), and community level patterns (MacKenzie et al. 2018c), among others. Many extensions focus on the detection process, including species misidentification (Royle & Link 2006; Miller et al. 2011), correlated detections (Hines et al. 2009), and the use of multiple survey methods (Nichols et al. 2008). An impressive span of research topics feature occupancy modeling at their core, including .

One of the primary R packages for analyzing occupancy data is RPresence (MacKenzie & Hines 2023), which incorporates code from two additional software programs: (1) Program Presence, a Windows©-based program with a graphical user-interface (GUI) and program called GENPRES, which generates simulated data for occupancy analysis. It can be used as a design tool to determine the effort (number of sites and/or surveys) required to estimate occupancy or detection parameters to a certain level of precision. RPresence was developed to analyze occupancy data with R (R Core Team 2012) instead of the Presence GUI. This package enables users to gather and edit data using R and run models with a function call and formulae for parameters. In other words, R is the “front end” for sending analyses to Presence or GENPRES, which does the actual work and then returns the output to R.

Additional software and R packages are available for occupancy analysis as well. For example, the packages unmarked (Fiske & Chandler 2011), spOccupancy (Doser et al. 2022), and ubms (Kellner et al. 2021) provide tools for modeling species distribution and abundance patterns while accounting for measurement error. RMark (Laake 2013) is another popular offering that uses R as a front end to the software MARK (White & Burnham 2023), which provides tools for analyzing distribution and capture-mark-recapture data.

Paired with the rapid rise in hierarchical modeling is a lack of quantitative training among early-career ecologists (Barraquand et al. 2014), potentially due to a lack of quantitative training (mathematics, statistics, and programming) at the graduate and undergraduate level (Cuddington et al. 2023). Surveys suggest that an astounding 75% of early career scientists in ecology do not feel satisfied with their understanding of models that are relevant to their own field of interest (Barraquand et al. 2014). The call for increased quantitative training has been echoed since the turn of the century (Anderson et al. 2003), growing louder as rapid developments in the sciences demand the use of advanced quantitative methods. Barraquand et al. (2014) emphasize “With the increase in availability of advanced methods, quantitative training ought to focus on (i) understanding how these methods work and (ii) when to use them.” We suggest adding (iii) “how to use them” in light of recent calls to advance programming skills in students at any stage of their career (X Feng & Enquist 2020; Juavinett 2022).

In this article, we introduce occupancyTuts, an R package featuring 28 learnr tutorials (Aden-Buie et al. 2023) that guide users through the theory and analysis of occupancy data with the package RPresence. The tutorials include written, instructional videos, R exercises, interactive Shiny components (Chang et al. 2022), and quiz elements that roughly accompany the book, “Occupancy Estimation and Modeling” and the published papers that make up its foundation (MacKenzie et al. 2018a). In developing these tutorials, our aim was to provide hands-on quantitative training and show users how to run occupancy analyses in R on their computer.

2. THE occupancyTuts PACKAGE

occupancyTuts provides background and instructions for occupancy analysis with RPresence. The package is available on the CRAN repository.

install.packages("occupancyTuts")

The canonical home of occupancyTuts is https://code.usgs.gov/vtcfwru/occupancyTuts/, where users can post issues, create merge requests, and download beta versions that are in development. Once installed, learnr’s available_tutorials() function can be used to display a list of tutorials:

learnr::available_tutorials(package = "occupancyTuts")[,2:4]
name title description
binomial occupancyTuts: Binomial Probability An introduction to the binomial probability mass function, the binomial distribution, and binomial likelihood.
binomialR occupancyTuts: Binomial Probability Functions in R An introduction to the family of binomial functions in R.
design_matrices occupancyTuts: Design Matrices in RPresence An introduction to design matrices in RPresence.
eh occupancyTuts: Encounter Histories An introduction to documenting survey results as encounter histories.
gof occupancyTuts: Goodness of fit test An introduction to goodness of fit and how it is implemented in RPresence.
intro occupancyTuts: Introduction This is the first tutorial to complete. It introduces what a learnr tutorial is and gives an overview of the available tutorials in the R package, occupancyTuts.
model_selection occupancyTuts: Model Selection Model Selection tutorial.
ms_false_positive occupancyTuts: Multi-season Occupancy with false-positive detections Multi-season occupancy with false positive detections tutorial.
ms_multi_state occupancyTuts: Multi-season, multi-state Occupancy in RPresence Multi-season multi-state patch occupancy tutorial.
ms_species_richness occupancyTuts: Multi-season Species Richness Occupancy in RPresence Multi-season species richness occupancy tutorial.
multi_season occupancyTuts: Multi-season Occupancy in RPresence Multi-season patch occupancy tutorial.
multinomial occupancyTuts: Multinomial Likelihood An introduction to the multinomial probability mass function, the multinomial distribution, and multinomial likelihood.
multinomialR occupancyTuts: Multinomial Probability Functions in R An introduction to the family of multinomial functions in R.
optimization occupancyTuts: Optimization An introduction to computer optimization procedures.
single_season occupancyTuts: Single Season Occupancy in RPresence Single season patch occupancy tutorial with no covariates.
sitecovs occupancyTuts: Site-level Covariates Analyzing occupancy data with site covariates.
software occupancyTuts: Occupancy Software An introduction to primary software used in this package, RPresence.
spatials occupancyTuts: Incorporating Spatial Data Working with spatial data.
ss_corr_det occupancyTuts: Single-season, Correlated Detections Single-season correlated detections occupancy model
ss_false_pos occupancyTuts: Single Season Model With Identification Errors Single-season model with identification errors tutorial.
ss_mixture occupancyTuts: Single Season Model With Heterogeneous Detections Single-season model with heterogeneous detections.
ss_multi_method occupancyTuts: Single-season, multi-method model Single-season multi-method occupancy model
ss_multistate occupancyTuts: Single Season, Multi-state Model Single-season model with multiple states tutorial.
ss_species_richness occupancyTuts: Species-richness Occupancy Species richness using occupancy models
ss_two_species occupancyTuts: Single-season, Two-species Occupancy Single-season two-species interaction occupancy model
study_design occupancyTuts: Occupancy Study Design An introduction to how to design an occupancy study for a target species.
surveycovs occupancyTuts: Survey-level Covariates Analyzing occupancy data with survey covariates.
wrangling occupancyTuts: Data Wrangling An introduction to common data wrangling techniques for occupancy analysis

As shown, occupancyTuts includes 28 tutorials, roughly grouped into the following categories:

  • Background and statistical theory - tutorials that introduce learnr tutorials and develop proficiency in probability mass functions (pmf) and likelihood, especially the binomial and multinomial pmf’s, which provide the statistical machinery behind many occupancy modeling approaches.

  • Software - tutorials that introduce RPresence and discuss general optimization methods for finding maximum likelihood estimates.

  • Single season occupancy models - tutorials centered around the seminal single season occupancy paper (MacKenzie et al. 2002), how to wrangle data and include site and survey covariates into models, and how to evaluate different models with goodness of fit and model selection methods.

  • Study design - tutorials that teach how to design an occupancy study for a target species in terms of identifying the number of study sites and the number of repeat surveys needed to maximize precision (Bailey et al. 2007).

  • Multi-season or dynamic occupancy models - tutorial that introduces the multi-season or dynamic occupancy model in which occupancy state changes through time (MacKenzie et al. 2003). This model is very popular for monitoring status and trend through time while accounting for errors in detection.

  • Spin-off models - tutorials that introduce a host of extensions to the single season and multi-season occupancy models, including multi-state, multi-method, and multi-species models.

Tutorials can be accessed via the “Tutorial” tab in RStudio, or can be launched via code. For example, the following code will launch the tutorial that introduces the single season occupancy model:

learnr::run_tutorial(
  name = "single_season",
  package = "occupancyTuts"
)

The run_tutorial() function launches the tutorial in the user’s web browser, as shown in Figure 2. When launched, R is running an RShiny (Chang et al. 2022) application that “listens” to commands or entries made within the tutorial itself, and will respond when called.

Each tutorial is divided into topics, which can be seen in the left menu. Each topic may contain written, instructional videos, R exercises, interactive Shiny widgets, and quiz elements. For any given tutorial, the first topic is “Prerequisites”, which identifies the preceeding required tutorial and also provides a list of suggested or potential readings. Progression can be saved, allowing users to close out when needed and returned to at a later time.

Figure 2. Screenshot of the “single_season” occupancyTuts tutorial that launches in the user’s web browser when called with learnr’s “run_tutorial” function. Each tutorial consists of several topics highlighted in the left menu.

The topics are consistent among tutorials in that they begin by providing a motivating background, followed by objectives, and then guide users through an analysis step-by-step. In an attempt to de-mystify how the field data (typically a pattern of detections and non-detections) translate into parameter estimates, many of the tutorials include videos that illustrate the nuts and bolts of the analysis in a simple spreadsheet environment Completed tutorials can be printed as a PDF for future reference.

The final topic of any tutorial is “What’s next?” It features the tutPrePost()function that generates a dataframe that shows tutorial follow-ups. Follow-ups are coded as 1 or 2, where 1 indicates a tutorial that may be of interest (an FYI) and a 2 indicates a suggested follow-up. For example, the code below returns the tutorials that users may be interested in after completing the “single_season” tutorial:

occupancyTuts::tutPrePost(tut = "single_season", type = "post")
Suggested follow-ups are listed below (1 = fyi, 2 = suggested):To run a tutorial, use learnr::runtutorial(*name*, package = occupancyTuts)
tut follow_up description
gof 2 An introduction to goodness of fit and how it is implemented in *RPresence*.
optimization 2 An introduction to computer optimization procedures.
sitecovs 2 Analyzing occupancy data with site covariates.
spatials 2 Working with spatial data.
ss_corr_det 1 Single-season correlated detections occupancy model
ss_false_pos 1 Single-season model with identification errors tutorial.
ss_mixture 1 Single-season model with heterogeneous detections.
ss_multi_method 1 Single-season multi-method occupancy model
ss_multistate 1 Single-season model with multiple states tutorial.
ss_species_richness 1 Species richness using occupancy models
ss_two_species 1 Single-season two-species interaction occupancy model
study_design 1 An introduction to how to design an occupancy study for a target species.
surveycovs 1 Analyzing occupancy data with survey covariates.
wrangling 2 An introduction to common data wrangling techniques for occupancy analysis

As shown, the suggested follow-up tutorials to the single-season occupancy model include the topics of goodness of fit (determining if an occupancy model “fits” the observed field data), optimization (understanding how RPresence finds the maximum likelihood estimates and their precision), and learning how to wrangle data so that site-level or survey-level covariates can be included in the analysis. The “FYI” tutorials include the many spin-offs of the single season occupancy model, such as including false positive survey results (“ss_false_pos”). If occupancyTuts is used in a classroom or workshop, the instructor should identify the workflow.

3. BETA TESTING

occupancyTuts served as the basis for a course in Occupancy Modeling at the University of Vermont and the National Conservation Training Center in 2022 and 2023. Overall, students were very receptive to the teaching format and were able to run complex single- or multi-season models featuring covariates, model selection, and goodness of fit analysis, ultimately producing written summaries that would be used in a thesis or dissertation chapter. Students also provided suggestions for tutorial improvement.

4. SUMMARY AND DISCUSSION

The R package, occupancyTuts, provides a new entry into teaching quantitative methods to students of ecology. As an open-access contribution available to anyone with access to a computer and internet for download, occupancyTuts provides one option for increasing quantitative training opportunities for students that can be used in both in-person and on-line classrooms (Touchon & McCoy 2016; Bachner & O’Bryrne 2019), workshops (LaTourrette et al. 2021), clubs (Johnston et al. 2019; Hagan 2020), or as an independent study. The use of learnr as a teaching modem builds not only background in ecology theory and statistical underpinnings, but also builds confidence in coding (Juavinett 2022) and performance (Freeman et al. 2014).

We plan for development of new tutorials that use RPresence as the analysis engine. In progress tutorials can be downloaded with the following code:

remotes::install_gitlab(
  repo = "vtcfwru/occupancyTuts/",
  host = "code.usgs.gov",
  dependencies = TRUE)

The master branch can also be manually downloaded and installed from .zip files (Windows users) and tar.gz files (Mac and Linux users) from:

  • https://code.usgs.gov/vtcfwru/-/archive/master/occupancyTuts-master.zip

  • https://code.usgs.gov/vtcfwru/-/archive/master/occupancyTuts-master.tar.gz

We welcome new tutorial contributions that use other R packages as the analysis engine. Such contributions can make use of the background content provided by occupancyTuts, but provide alternative guidance for running models in a package of choice.

Acknowledgments

We thank Shawn Haskell, Alexej Siren, and the many enthusiastic students enrolled in Occupancy Modeling at the University of Vermont and the National Conservation Training Center for feedback on the package tutorials. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. The Vermont Cooperative Fish and Wildlife Research Unit is jointly supported by the U.S. Geological Survey, University of Vermont, Vermont Fish and Wildlife Department, and Wildlife Management Institute.

References

Aden-Buie, G., Schloerke, B., Allaire, J. & Rossell Hayes, A. (2023). Learnr: Interactive tutorials for r.
Anderson, D., Cooch, E., Gutierrez, R., Krebs, C., Linberg, M., Pollack, K., Ribic, C. & Shenk, T. (2003). Rigorous science: Suggestions on how to raise the bar. Wildlife Society Bulletin, 31, 296–305.
Bachner, J. & O’Bryrne, S. (2019). Teaching quantitative skills in online courses: Today’s key areas of focus and effective learning tools. Journal of Political Science Education, 17, 297–310.
Bailey, L.L., Hines, J.E., Nichols, J.D. & MacKenzie, D.I. (2007). SAMPLING DESIGN TRADE-OFFS IN OCCUPANCY STUDIES WITH IMPERFECT DETECTION: EXAMPLES AND SOFTWARE. Ecological Applications, 17, 281–290.
Bailey, L., MacKenzie, D. & Nichols, J. (2013). Advances and applications of occupancy models. Methods in Ecology and Evolution, 2, 49.
Barraquand, F., Ezard, T., Jorrgensen, P., Zimmerman, N., Chamberlain, S., Salguero-G??mez, R., Curran, T. & Poisot, T. (2014). Lack of quantitative training among early-career ecologists: A survey of the problem and potential solutions. PeerJ, e, 285.
Chang, W., Cheng, J., Allaire, J., Sievert, C., Schloerke, B., Xie, Y., Allen, J., McPherson, J., Dipert, A. & Borges, B. (2022). Shiny: Web application framework for r.
Cuddington, K., Abbott, K.C., Adler, F.R., Aydeniz, M., Dale, R., Gross, L.J., Hastings, A., Hobson, E.A., Karatayev, V.A., Killion, A., Madamanchi, A., Marraffini, M.L., McCombs, A.L., Samyono, W., Shiu, S.-H., Watanabe, K.H. & White, E.R. (2023). Challenges and opportunities to build quantitative self-confidence in biologists. BioScience, 73, 364–375.
Doser, J.W., Finley, A.O., K??ry, M. & Zipkin, E. (2022). spOccupancy: An r package for single-species, multi-species, and integrated spatial occupancy models. Methods in Ecology and Evolution.
Fiske, I. & Chandler, R. (2011). unmarked: An R package for fitting hierarchical models of wildlife occurrence and abundance. Journal of Statistical Software, 43, 1–23.
Freeman, S., Eddy, S.L., McDonough, M., Smith, M.K., Okoroafor, N., Jordt, H. & Wenderoth, M.P. (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Science, 111, 8410–8415.
Hagan, A.K. (2020). Ten simple rules to increase computatinal skills among biologists with code clubs. PLOS Computational Biology, 16, e1008119.
Hines, J., Nichols, J., Royle, J., MacKenzie, D., Gopalaswamy, A., Kumar, S. & Karanth, K. (2009). Tigers on Trails: Occupancy Modeling for Cluster Sampling. Ecological Applications, 100319061507001.
Johnston, L., Bonsma-fish, M., Ostblom, J., Hasan, A., Santangelo, J., Croome, L., Tran, L., Andrede, E. de & Mahallati, S. (2019). A graduate student-led participatory live-coding quantitative methods course in r: Experiences on initiating, developing, and teaching. Journal of Open Source Education, 2, 49.
Juavinett, A.L. (2022). The next generation of neuroscientists needs to learn how to code, and we need new ways to teach them. Neuron, 110, 576–578.
Kellner, K.F., Fowler, N.L., Petroelje, T.R., Kautz, T.M., Beyer, D.E. & Belant, J.L. (2021). ubms: An R package for fitting hierarchical occupancy and n-mixture abundance models in a bayesian framework. Methods in Ecology and Evolution, 13, 577–584.
Laake, J.L. (2013). RMark: An r interface for analysis of capture-recapture data with MARK. Alaska Fish. Sci. Cent., NOAA, Natl. Mar. Fish. Serv., Seattle, WA.
LaTourrette, K., Stengel, A. & Clarke, J. (2021). Student-led workshops: Filling skills gaps in computational research for life sciences. Natural Sciences Education, 50, e20052.
MacKenzie, D. & Hines, J. (2023). RPresence: R interface for program PRESENCE.
MacKenzie, D.I., Nichols, J.D., Hines, J.E., Knutson, M.G. & Franklin, A.B. (2003). ESTIMATING SITE OCCUPANCY, COLONIZATION, AND LOCAL EXTINCTION WHEN A SPECIES IS DETECTED IMPERFECTLY. Ecology, 84, 2200–2207.
MacKenzie, D.I., Nichols, J.D., Lachman, G.B., Droege, S., Andrew Royle, J. & Langtimm, C.A. (2002). ESTIMATING SITE OCCUPANCY RATES WHEN DETECTION PROBABILITIES ARE LESS THAN ONE. Ecology, 83, 2248–2255.
MacKenzie, D., Nichols, J., Royle, J., Pollack, K., Bailey, L. & Hines, J. (2018a). Occupancy estimation and modeling. Elsevier.
MacKenzie, D.I., Nichols, J.D., Royle, J.A., Pollock, K.H., Bailey, L.L. & Hines, J.E. (2018b). More than two occupancy states. pp. 377–397. Elsevier.
MacKenzie, D.I., Nichols, J.D., Royle, J.A., Pollock, K.H., Bailey, L.L. & Hines, J.E. (2018c). Occupancy in community-level studies. pp. 557–583. Elsevier.
MacKenzie, D.I., Nichols, J.D., Royle, J.A., Pollock, K.H., Bailey, L.L. & Hines, J.E. (2018d). Species co-occurrence. pp. 509–556. Elsevier.
Miller, D.A., Nichols, J.D., McClintock, B.T., Grant, E.H.C., Bailey, L.L. & Weir, L.A. (2011). Improving occupancy estimation when two types of observational error occur: non-detection and species misidentification. Ecology, 92, 1422–1428.
Nichols, J.D., Bailey, L.L., O’Connell Jr., A.F., Talancy, N.W., Campbell Grant, E.H., Gilbert, A.T., Annand, E.M., Husband, T.P. & Hines, J.E. (2008). Multi-scale occupancy estimation and modelling using multiple detection methods. Journal of Applied Ecology, 45, 1321–1329.
R Core Team. (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Royle, J.A. & Link, W.A. (2006). GENERALIZED SITE OCCUPANCY MODELS ALLOWING FOR FALSE POSITIVE AND FALSE NEGATIVE ERRORS. Ecology, 87, 835–841.
Touchon, J.C. & McCoy, M.W. (2016). The mismatch between current statistical practice and doctoral training in ecology. Ecosphere, e01394.
White, G. & Burnham, K. (2023). Program MARK: Survival estimation from populations of marked animals. Bird Study, 46, 120–139.
X Feng, H Qiao & Enquist, B.J. (2020). Doubling demands in programming skills call for ecoinformatics education. Frontiers in Ecology and the Environment, 18, 123–124.