1 Chapter Title


1.1 Overview

1.2 Objectives

1.3 Introduction

1.3.1 Subheaders

1.4 Chapter Summary

1.5 What’s Next?

1.6 Other notes for writers

1.6.1 Inserting images

1.6.2 Blockquotes

1.6.3 Equations

1.6.4 Tables and figures

1.6.5 Notes on code folding

1.6.6 SHINY APPS????????

1.6.7 HOMEWORK EXERCISES as learnr files??????

2 Introduction


2.1 Trolls of Middle Earth

2.2 How to Read This Book

2.3 Coding Conventions in this Book

2.4 What’s Next?

3 Software


3.1 Introduction

3.2 Objectives

3.3 R and RStudio

3.4 Presence and RPresence

3.5 Other Programs

3.5.1 MARK and RMark

3.5.2 unmarked

3.6 Book Exercises and Tutorials Setup

3.7 Summary

3.8 What’s Next?

4 Binomial Probability


4.1 Overview

4.2 Objectives

4.3 The Binomial Probability Function

4.3.1 Graphing the Binomial Distribution

4.4 Binomial Likelihood

4.4.1 The Likelihood Profile

4.4.2 The Log Likelihood Function

4.4.3 Precision of the MLE Estimator

4.5 Chapter Summary

4.6 What’s Next?

6 Multinomial Probability


6.1 Overview

6.2 Objectives

6.3 Introduction

6.3.1 The Multinomial Probability Function

6.3.2 Example input

6.3.3 Graphing the Multinomial Distribution

6.4 Multinomial likelihood

6.5 Multinomial maximum likelihood

6.6 Graphing the multinomial likelihood function

6.7 Using optim to find the MLE’s

6.8 Chapter Summary

6.9 What’s Next?

7 Single Season Occupancy Models

7.1 Introduction

The last few chapters have been heavy on statistical probability, but we promise they provided important background material. We know you are itching to get back to the main problem at hand: estimating the probability that sites are occupied by a target species of interest. In this chapter, we turn to single season occupancy models, which follow the material by MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. A. Royle, and C. A. Langtimm. 2002. Estimating site occupancy when detection probabilities are less than one. Ecology 83:2248-2255. This article is on the web: http://www.uvm.edu/envnr/vtcfwru/spreadsheets/Occupancy/Occupancy3.htm and is also described in Chapter 4 of the book, Occupancy Estimation and Modeling.

7.2 Detection histories

For occupancy models, the goal is to determine the probability that a site is occupied by a target of interest, usually a species, given that detection is imperfect. Thus, we are interested in two things. First, is the target present at a site? What factors influence the presence or absence of a target at a site? Second, for targets that are present, is the target detected by an observer? What factors influence the detection or non-detection? In this sense, occupancy models are “hierarchical” models in that there is a process that first establishes the distribution of targets (the true presence or absence of the target), and then there is an observation process of the resulting distribution (the detection or non-detection of the target). As researchers, the data we have in hand are the latter – the detection or non-detection of target species across surveys. From that information, we try to understand the former – the true presence or absence of the target, and variables that might influence this pattern.

As an example, let’s assume you are interested in developing an occupancy model for trolls of Middle Earth. You select 75 study sites, and set out to survey each site three times in quick succession (or quick enough to assume that the site does not change in occupancy status between surveys). Or, alternatively, if the site is large enough, the site could have been sampled in three different locations in the same time period.

In any event, the species is recorded as being detected (1) or not (0) for each site based on your survey efforts. Let’s assume that on site 1 a troll was detected on the first survey, but not detected on the second or third survey. This is called a detection history or encounter history for site 1, and would be recorded as ‘1-0-0’. In the same way, the detection history for each site is determined. A history of ‘1-1-0’ means that a species of interest was detected on the first and second samples, but not on sample 3. A history of ‘0-1-0’ means that the species was detected on the second sampling occasion, but not on the first or third occasion.

How many possible histories are there? Because a particular sample occasion has only two outcomes (0 or 1), and there are 3 sampling occasions, there are \(2^3\) = 8 kinds of histories in total. These histories are:

# all possible combinations of 0:1 for 3 surveys
htable <- expand.grid("Survey 1" = 1:0, "Survey 2" = 1:0, "Survey 3" = 1:0)
htable$History <- apply(htable, MARGIN = 1, FUN = paste, collapse = "")
htable[9,] <- NA
knitr::kable(htable[1:8, ],
             caption = "List of all possible encounter histories for 3 surveys.",
             align = 'c') %>%
 kable_classic(lightable_options = "basic") 
Table 7.1: List of all possible encounter histories for 3 surveys.
Survey 1 Survey 2 Survey 3 History
1 1 1 111
0 1 1 011
1 0 1 101
0 0 1 001
1 1 0 110
0 1 0 010
1 0 0 100
0 0 0 000

If there were four sampling occasions instead of three, there would be \(2^4\) = 16 possible detection histories. If there were five sampling occasions, there would be \(2^5\) = 32 possible detection histories. And if there were six sampling occasions, there would be \(2^6\) = 64 possible detection histories.

Remember, the data we have are encounter histories that provide information about detection or non-detection. If a target was detected in any of the site surveys, we also know that the target was truly present. However, a history where the target was unobserved across all histories (e.g., ‘0-0-0’) is unique. It means only that the target was undetected across surveys; it does not necessarily convey presence or absence. Occupancy modeling allows us to take in our detection data and uncover what we are really interested in: true patterns of presence or absence of a target.

Let’s assume you sampled all 75 sites three times and recorded the frequency of each kind of capture history.

# read in the data
trolls <- read.csv("tutorials/simulation/trolldata_ss.csv")

# assign encounter histories to each site
dethist <- spread(trolls[,c('ValianDate','siteID','detection')], ValianDate, detection)
hists <- apply(dethist[-c(1, 5, 6)], MARGIN = 1, FUN = paste, collapse = "")
hists <- table(hists)

# get indices and update table
indices <- match(x = names(hists), table = htable[1:8, "History"])
htable[indices, "Frequency"] <- hists

# show the table
             caption = "Troll encounter histories and their observed frequencies.",
             align = 'c') %>%
 kable_classic(lightable_options = "basic") 
Table 7.2: Troll encounter histories and their observed frequencies.
Survey 1 Survey 2 Survey 3 History Frequency
1 1 1 111 15
0 1 1 011 1
1 0 1 101 7
0 0 1 001 1
1 1 0 110 3
0 1 0 010 1
1 0 0 100 2
0 0 0 000 45

For example, 15 sites had a history of 111, 7 sites had a history of 101, and so on. This is a typical “single-season” occupancy data-set. It’s called a single-season data-set because all surveys were conducted within a short time frame (a single season).

7.3 Model assumptions

Now that you understand the basics behind the data collection, it’s time to review the assumptions of the single-season occupancy model and the conditions in which the detection-non-detection data are collected:

  • The system is demographically closed to changes in the occupancy status of site across survey periods. At the species level, this means that a species cannot colonize/immigrate to a site, or go locally extinct/emigrate from that site during the course of the study.

  • Species are not falsely detected.

  • Detection at a site is independent of detection at other sites. This means that your sites should be far enough apart to be biologically independent.

The multi-season occupancy model handles violations to demographic closure, and the misidentification model handles violations to false identification; both are discussed later in the book.

7.4 The saturated model

OK, now what? Let’s start by finding the probability of getting a ‘1-0-0’, ‘1-1-1’, ‘1-0-1’, etc. history directly from the raw data. That is, we’ll start with the saturated model. How do we compute these probabilities? Simple! Just divide the frequency of each history by the total number of sites:

# compute probabilities of each detection history
htable2 <- htable
htable2$Probability = round(htable2$Frequency/75, digits = 3)

# add the column sums
sums <- t(colSums(htable2[1:8, c(5, 6)]))
indices <- match(colnames(sums), colnames(htable2))
htable2[9,indices] <- round(sums)

# print the table
             align = 'c', 
             caption = "Raw history probabilities of encounter histories.") %>%
 kable_classic(lightable_options = "basic") %>%
 row_spec(9, bold = TRUE, background = 'lightgray')
Table 7.3: Raw history probabilities of encounter histories.
Survey 1 Survey 2 Survey 3 History Frequency Probability
1 1 1 111 15 0.200
0 1 1 011 1 0.013
1 0 1 101 7 0.093
0 0 1 001 1 0.013
1 1 0 110 3 0.040
0 1 0 010 1 0.013
1 0 0 100 2 0.027
0 0 0 000 45 0.600
75 1.000

For example, the history ‘0-1-1’ has probability of 0.013, which is 1 divided by the total frequencies, or 75. The probability of getting a ‘1-1-1’ history is 0.2, and so on. These are simple proportions, and are the maximum likelihood probabilities based on raw data. You can’t get better probability estimates than this . . . they “fit” the data perfectly.

Can we define this “fit” with a single quantity? In Chapter 5, we introduced the multinomial probability function with dice rolling as an example. If you rolled a die 100 times, you could count how many times we rolled a ‘1’, ‘2’, . . . ‘6’, and the compute the most likely probabilities of rolling each face.

Occupancy encounter histories and frequencies are no different. Hopefully you recognize that the histories (e.g., ‘1-0-0’, ‘1-1-0’, etc.) are like an 8-faced die. We have “rolled” a 111 history 15 times. We have “rolled” a 101 history 7 times.

We assume that you’ve already read the maximum likelihood chapter, but if you’re rusty, take time to review it now. The multinomial likelihood function has the form:

\[\mathcal{L}(p_i ; n, y_i) = {n_i\choose y_i} p_1^{y_1} p_2^{y_2} p_3^{y_3} ...p_8^{y_8} \tag{7.1}\]

This function states that the likelihood of \(p_i\) (the probability of detection history \(i\)), given \(n\) (the total number of sites sampled) and \(y_i\) (the frequency of each type of detection history) is equal to the multinomial coefficient \(n \choose y_i\) multiplied by the probability of history 1 raised to its frequency, multiplied by the probability of history 2, raised to its frequency, etc. until all 8 histories are accounted for. Remember, our table has 8 different histories.

Commonly, the natural log version of this formula is used and the multinomial coefficient is dropped because it doesn’t contain any parameters of interest. We know \(n_i\) and \(y_i\), but we’re interested in the unknown \(p_i\). The log-likelihood function has the form:

\[log(\mathcal{L}(p_i | n, y_i) \propto y_{1} log(p_1) + y_{2} log(p_2) + y_{3} log(p_3) +...+ y_{8} log(p_8) \tag{7.2}\]

This function says that the log likelihood of the parameters, \(p_i\), given \(n\) (the total sites sampled) and \(y_i\) (the frequency of each history) is proportional to \(y_1\) times log(\(p_1\)) + \(y_2\) times log(\(p_2\)) + …\(y_8\) times log(\(p_8\)). “Log” here is the natural log.

So, for a single-season occupancy model, what is the log likelihood of the saturated model? Well, we know the frequencies for each history, and we just computed the probabilities for each history based on the raw data. We can re-write the multinomial log likelihood as:

\[log(\mathcal{L}(p_i | n, y_i) \propto y_{\color{blue}{111}} log(\color{brown}{p_{111}}) + y_{\color{blue}{011}} log(\color{brown}{p_{011}}) + y_{\color{blue}{101}} log(\color{brown}{p_{101}}) +...+ y_{\color{blue}{000}} log(\color{brown}{p_{000}}) \tag{7.3}\]

To solve this, we need to take the natural logs of each probability (calculated in column 7 below), and then multiply each history frequency by the natural log of its probability (calculated in column 8 below), and add it all up.

# compute probabilities of each detection history
htable3 <- htable2

# append the log probability
htable3[1:8, "LnProb"] = round(log(htable3[1:8, "Probability"]), digits = 3)

# append the product
htable3[1:8, "Product"] = htable3[1:8, "LnProb"] * htable3[1:8, "Frequency"]

# add the column sums
sums <- t(colSums(htable3[1:8, c(7:8)]))
indices <- match(colnames(sums), colnames(htable3))
htable3[9,indices] <- round(sums, digits = 3)

# print the table
             caption = "The saturated model's log likelihood is the sum of the final column.",
             align = "c") %>%
 kable_classic(lightable_options = "basic") %>%
  row_spec(9, bold = TRUE, background = "lightgray")
Table 7.4: The saturated model’s log likelihood is the sum of the final column.
Survey 1 Survey 2 Survey 3 History Frequency Probability LnProb Product
1 1 1 111 15 0.200 -1.609 -24.135
0 1 1 011 1 0.013 -4.343 -4.343
1 0 1 101 7 0.093 -2.375 -16.625
0 0 1 001 1 0.013 -4.343 -4.343
1 1 0 110 3 0.040 -3.219 -9.657
0 1 0 010 1 0.013 -4.343 -4.343
1 0 0 100 2 0.027 -3.612 -7.224
0 0 0 000 45 0.600 -0.511 -22.995
75 1.000 -24.355 -93.665

The sum of the final column is the saturated model’s LogLike, or -93.665. Jot this number down on a sheet of paper, as we’ll return to it later.

What’s the big deal? Well, because the probabilities for each history were computed from the raw data, the saturated model fits the data perfectly. The saturated model’s LogLikelihood therefore sets the “standard” upon which other models will be based.

As we mentioned in Chapter 5, if we multiply this value by -2, the result is the saturated model’s -2LogLike. The reason for the minus sign is that it allows a computer to use a ‘minimize’ function in order to compute the maximum log-likelihood. The ‘2’ makes model comparisons easier, as comparing models involves twice the difference in log-likelihood values.

7.5 Occupancy model parameters

Since the saturated model fits perfectly, you might wonder why do we need other models. The problem with the saturated model is that it doesn’t give us any information about out main parameter of interest, site occupancy. So, the idea is to build models of how those data could be generated as a function of meaningful parameters, probability of occupancy (\(\psi\)) and probability of detection (\(p\)).

Now that we know the saturated model’s LogLike and -2LogLike, we can forge ahead and estimate the probability of getting a ‘1-1-1’, ‘1-1-0’, etc., not by the raw data, but by estimating the occupancy model parameters: \(\psi\), \(p_1\), \(p_2\), and \(p_3\) . What are these? These are the parameters of the single-season occupancy model. Since we surveyed our sites 3 times, there are three detection parameters (\(p_1\), \(p_2\), and \(p_3\)), and psi (or \(\psi\)), which is the actual probability that the site is occupied.

Let’s define the occupancy parameters more formally:

  • psi (\(\psi\)) is the probability that a site is occupied by the target of interest. It is the probability that at least one individual of the target is present at a site;
  • \(p_1\) is the probability that a target is detected at a site during survey 1, given that the site is actually occupied;
  • \(p_2\) is the probability that a target is detected at a site during survey 2, given that the site is actually occupied;
  • \(p_3\) is the probability that a target is detected at an occupied site during survey 3.

The \(\psi\) parameter is about the “state” of the site, while the next three parameters are about the “observation” of that state.

It’s a bit unfortunate that the detection parameters are labeled \(p_1\), \(p_2\), and \(p_3\) because it’s easy to confuse these with the \(p_i\)’s in the multinomial function. Just keep in mind that when we say \(p_1\), \(p_2\), or \(p_3\) we are usually referring to the probability of detecting a species during occasion 1, 2, or 3, unless otherwise noted.

In a nutshell, we will estimate the probability of observing a particular history by combining estimates of \(\psi\), \(p_1\), \(p_2\), and \(p_3\), rather than estimating the probabilities from the raw data as we did for the saturated model.

For instance, the probability of observing a ‘1-1-1’ history is \(\psi*p_1*p_2*p_3\) - we know the site was occupied (\(\psi\)) because the species was observed on at least one occasion, and it was detected on the first (\(p_1\)), second (\(p_2\)), and third surveys (\(p_3\)).

When writing out encounter history probabilities, it’s helpful to write it hierarchically; that is, start with the state of the site (occupied or unoccupied), and then include the observation of that state on a survey-by-survey basis. For instance, the encounter history probability for a site with a history of ‘1-0-0’ is the product of:

  • The state:

    • \(\psi\) since it was detected at least once, the site must have been occupied
  • The observation of that state:

    • \(p_1\) since it was detected in the first survey
    • \((1-p_2)\) since it was not detected in the second survey
    • \((1-p_3)\) since it was not detected in the third survey

The result is \[\psi * p_1 * (1-p_2) * (1-p_3).\]

Let’s try one more, history 101. We know the site was occupied because the species was detected in at least one survey, and we know it was detected on the first and third survey but not the second. The probability of getting a ‘1-0-1’ history is therefore

\[\psi * p_1 * (1-p_2) * p_3.\]

Go ahead and work your way through the remaining histories (‘0-0-1’, ‘0-1-0’, ‘1-1-0’, ‘1-1-1’, ‘0-0-0’). Don’t skip this step! There’s something “healthy” about writing out detection history probabilities - it forces the concepts of the model to really sink in. The only tricky history is ‘0-0-0’.

The ‘0-0-0’ history is “tricky” only in that there are 2 ways in which this history could be generated:

  1. The true state is occupied, but the target was missed on all three surveys, which would be written: \[\psi * (1-p_1) * (1-p_2) * (1-p_3).\]
  2. The true state is unoccupied, in which case detection is impossible. This history is given by: \[(1-\psi).\]

Thus the probability of getting a ‘000’ history needs to account for both possibilities, and so we add the two alternatives:

\[\psi*(1-p_1)*(1-p_2)*(1-p_3) + (1-\psi).\]

The full table of history probabilities is shown below:

# compute probabilities of each detection history
htable2 <- htable[1:8,1:5]
 htable2[,"Probability"] <- "NA"
 htable2[1, "Probability"] <- "psi * p~1~ * p~2~ * p~3~"
 htable2[2, "Probability"] <- "psi * (1-p~1~) * p~2~ * p~3~"
 htable2[3, "Probability"] <- "psi * p~1~ * (1-p~2~) * p~3~"
 htable2[4, "Probability"] <- "psi * (1-p~1~) * (1-p~2~) * p~3~"
 htable2[5, "Probability"] <- "psi * p~1~ * p~2~ * (1-p~3~)"
 htable2[6, "Probability"] <- "psi * (1-p~1~) * p~2~ * (1-p~3~)"
 htable2[7, "Probability"] <- "psi * p~1~ * (1-p~2~) * (1-p~3~)"
 htable2[8, "Probability"] <- "psi * (1-p~1~) * (1-p~2~) * (1-p~3~) + (1-psi)"
htable2$LnProb <-  "Ln(Prob)"
htable2$Product <-  "Ln(Prob) * Freq"

# print the table
             caption = "The occupancy model's log likelihood is the sum of the final column.",
             align = "c") %>%
 kable_classic(lightable_options = "basic") 
Table 7.5: The occupancy model’s log likelihood is the sum of the final column.
Survey 1 Survey 2 Survey 3 History Frequency Probability LnProb Product
1 1 1 111 15 psi * p1 * p2 * p3 Ln(Prob) Ln(Prob) * Freq
0 1 1 011 1 psi * (1-p1) * p2 * p3 Ln(Prob) Ln(Prob) * Freq
1 0 1 101 7 psi * p1 * (1-p2) * p3 Ln(Prob) Ln(Prob) * Freq
0 0 1 001 1 psi * (1-p1) * (1-p2) * p3 Ln(Prob) Ln(Prob) * Freq
1 1 0 110 3 psi * p1 * p2 * (1-p3) Ln(Prob) Ln(Prob) * Freq
0 1 0 010 1 psi * (1-p1) * p2 * (1-p3) Ln(Prob) Ln(Prob) * Freq
1 0 0 100 2 psi * p1 * (1-p2) * (1-p3) Ln(Prob) Ln(Prob) * Freq
0 0 0 000 45 psi * (1-p1) * (1-p2) * (1-p3) + (1-psi) Ln(Prob) Ln(Prob) * Freq

The sum of the encounter histories must equal 1 (since we’ve accounted for every possible combination).

7.6 Occupancy MLEs

Once we have written out the encounter history probabilities in terms of the 4 occupancy parameters, our multinomial log likelihood function

\[log(\mathcal{L}(p_i | n, y_i) \propto y_{\color{blue}{111}} log(\color{brown}{p_{111}}) +\\ y_{\color{blue}{011}} log(\color{brown}{p_{011}}) + \\y_{\color{blue}{101}} log(\color{brown}{p_{101}}) +...+ \\y_{\color{blue}{000}} log(\color{brown}{p_{000}}) \tag{7.4}\]


\[log(\mathcal{L}(p_i | n, y_i) \propto y_{\color{blue}{111}} log(\color{brown}{\psi*p_1*p_2*p_3}) +\\ y_{\color{blue}{011}} log(\color{brown}{\psi(1-p_1) * p_2 * p_3}) + \\y_{\color{blue}{101}} log(\color{brown}{\psi * p_1*(1-p_2)*p_3}) +... +\\ y_{\color{blue}{000}} log(\color{brown}{\psi*(1-p_1)*(1-p_2)*(1-p_3) + (1-\psi)}) \tag{7.5}\]

Our goal is to find the combination of values of \(\psi, p_1, p_2, p_3\) which yield the maximum value of the log-likelihood function listed above.

One method is to try every possible combination of values for the parameters, compute the log-likelihood value, then choose the combination of parameters which gave the maximum value. For example, let’s compute the log likelihood for \(\psi\) = 0.7, \(p_1\) = 0.8, \(p_2\) = 0.9, and \(p_3\) = 1:

\[log(\mathcal{L}(p_i | n, y_i) \propto y_{\color{blue}{111}} log(\color{brown}{0.7*0.8*0.9*1}) + \\y_{\color{blue}{011}} log(\color{brown}{0.7*(1-0.8) * 0.9 * 1}) + \\y_{\color{blue}{101}} log(\color{brown}{0.7 * 0.8*(1-0.9)*1}) +... + \\y_{\color{blue}{000}} log(\color{brown}{0.7*(1-.08)*(1-0.9)*(1-1) + (1-0.7)}) \tag{7.5}\]

If you did the math, you would find the result. The log likelihood could be similarly computed for other combinations too. We input a certain combination of parameters, which in turn affect the history probability, which in turn affect the natural log probabilities, which in turn affect the product of the log probabilities times the history frequency, which in turn affects the final LogLike value:

\[ \text{parameters} \rightarrow \text{history probabilities} \rightarrow \text{log probabilities} \rightarrow \\ \text{product} \rightarrow \text{multinomial log likelihood}\]

We seek the combination of parameters that maximize the LogLike value. In doing so, we are finding parameter values that best describe the patterns of 0’s and 1’s that were observed in our surveys. The value of the parameters that result in the maximum LogLike are called “Maximum Likelihood Estimates”, or “MLEs”, and we base our scientific inferences on these estimates.

In Chapter 5 we mentioned how computers excel at finding the maximum of a function. They also excel at finding the minimum of a function. The package RPresence approaches the problem by minimizing the -2LogLike function instead of maximizing the LogLike function (the MLE’s are identical).

In Chapters 4 and 5, we mentioned that optimization routines often operate on inputs that are unconstrained, and this is true in the package RPresence as well. RPresence does not work on the constrained occupancy parameters \(\psi\), \(p_1\), \(p_2\), and \(p_3\), which are all probabilities bound between 0 and 1. Instead, it works on unconstrained beta parameters (\(\beta\)s.) Thus, RPresence finds the betas that minimize the -2LogLike.

\[ \text{betas} \rightarrow\text{parameters} \rightarrow \text{history probabilities} \rightarrow \text{log probabilities} \rightarrow \\ \text{product} \rightarrow \text{multinomial log likelihood}\rightarrow \text{-2LogLike}\]

Well, you may ask, what are the MLE’s for the troll data? It’s finally time to run your first analysis in the program RPresence.

7.7 RPresence

Hopefully RPresence is installed on your machine, so we simply need to load it into R so that its functions are available to us:


The troll data that we have been exploring are included in RPresence directly as a CSV (comma separated variables) file. We can read it in with the following code:

# read in the data
trolls <- read.csv(system.file("extdata/trolls.csv",  package = "RPresence"),
                     header = TRUE)
# peek at the data
##    siteID ValianDate region detection city hill river forest wetland hill15
## 1 1085441 1200/01/10   Arda         0    0    Y  0.00    0.0   0.000  0.443
## 2 1085441 1200/01/11   Arda         1    0    Y  0.00    0.0   0.000  0.443
## 3 1085441 1200/01/12   Arda         1    0    Y  0.00    0.0   0.000  0.443
## 4 1085441 1200/01/13   Arda         0    0    Y  0.00    0.0   0.000  0.443
## 5 1085441 1200/01/14   Arda         0    0    Y  0.00    0.0   0.000  0.443
## 6  985373 1200/01/10   Arda         0    0    N  0.03    0.1   0.007  0.083
##   stonefields northing random easting observer
## 1           0 576318.6  0.838 1735759  Legolas
## 2           0 576318.6  0.838 1735759  Legolas
## 3           0 576318.6  0.838 1735759   Gollum
## 4           0 576318.6  0.838 1735759   Gollum
## 5           0 576318.6  0.838 1735759   Gollum
## 6           0 634318.6  0.047  673759  Aragorn

The head() function shows us first 6 rows of the troll data-set. Each site in Middle Earth has a unique ID, such as “1085441”. The data-set shows the detection or non-detection of trolls for each site on a given date by a given observer. For instance, the first record was surveyed on 1200/01/10 by Legolas, who not detect any trolls (“detection” = 0). Legolas surveyed this same site the next day, 1200/01/11, where he detected a troll, as did Gollum on 1200/01/12. Perhaps Gollum spotted the same troll as Legolas, but perhaps not. The data only convey the detection or non-detection of a target.

The data-set also contains conditions associated with each site. For example, the column “city” indicates whether a site is within 30km of any Middle Earth city. There are other columns as well that describe site characteristics. We won’t be dealing with these columns in this chapter, but will revisit them in Chapter 8 where we will allow site occupancy to be a function of such variables.

7.8 createPao()

The basic input for nearly every analysis in RPresence is an R object of class “pao”, which stands for “presence-absence object”. A “pao” object is created with the createPao() function. The output of this function is a large R object that contains all of the information needed to run an analysis in RPresence.

It is well worth your time to look at this function’s help page:


The function’s description tells us that it “Creates a pao data object from R variables”. The function has several arguments, and many are populated with default values as indicated by the equal sign.

## function (data, unitcov = NULL, survcov = NULL, nsurveyseason = ncol(data), 
##     nmethods = 1, frq = rep(1, nrow(data)), title = "PRESENCE Analysis", 
##     unitnames = paste0("unit", 1:nrow(data)), paoname = "pres.pao") 

Let’s explore the first argument, which is the only argument that does not have a default value:

  • data = data frame containing the detection (=1) and non-detection (=0) data with observations from a sampling unit along a row, and surveys in the columns. Missing values are allowed (=NA).

Let’s stop there and shape our troll data to a format that createPao() needs. Below, we use the spread() function in the package, tidyr to turn our troll data-set from “long” format to “wide” format, where dates are spread across columns and sites are listed down rows. Thus, “ValianDate” is the “key” argument to the spread() function, and “detection” is the “value” argument to the spread() function. Have a look:

# assign encounter histories to each site
det_hist <- tidyr::spread(data = trolls[,c('ValianDate','siteID','detection')],
                          key = ValianDate, 
                          value = detection)

head(det_hist, n = 10)
##    siteID 1200/01/10 1200/01/11 1200/01/12 1200/01/13 1200/01/14
## 1   27530          0          0          0          0          0
## 2   50339          1          1          1          0          1
## 3   65048          0          0          0          0          0
## 4   79081          1          1          1          1          1
## 5   82884          0          0          0          0          0
## 6  103396          1          0          0          1          0
## 7  108217          0          0          0          0          0
## 8  114488          1          1          1          1          1
## 9  149175          0          0          0          0          0
## 10 156266          1          0          1          1          1

You can confirm that each site is listed by row, and detections for that site are provided by the different columns. Notice now that the format of the data differs from the summarized frequencies shown in many of the introductory tables.

This particular data-set gives results of 5 surveys. For our purposes, we will only include the first three surveys to match up with our introductory prose. Columns 2, 3, and 4 are sufficient to input to the createPao() function’s data argument. At this point, we have enough information to use the createPao() function and create an R object of class “pao”:

# trim the dataset to just three surveys
data <- det_hist[-c(1, 5, 6)]

# send the data to the createPao function
trollPao <- createPao(data = data)

# look at the structure of the returned object; note the class is "pao"
str(trollPao, max.level = 1)
## List of 15
##  $ nunits       : int 75
##  $ nsurveys     : int 3
##  $ nseasons     : int 1
##  $ nmethods     : num 1
##  $ det.data     :'data.frame':   75 obs. of  3 variables:
##  $ nunitcov     : int 0
##  $ unitcov      :'data.frame':   0 obs. of  0 variables
##  $ nsurvcov     : int 1
##  $ survcov      :'data.frame':   225 obs. of  1 variable:
##  $ nsurveyseason: int 3
##  $ title        : chr "PRESENCE Analysis"
##  $ unitnames    : chr [1:75] "unit1" "unit2" "unit3" "unit4" ...
##  $ surveynames  : chr [1:3] "1-1" "1-2" "1-3"
##  $ paoname      : chr "pres.pao"
##  $ frq          : num [1:75] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "class")= chr "pao"

As you can see, an R object of class “pao” is a list with 15 elements. Many of these elements were filled in with default values used by the createPao() function. Let’s work our way through this list:

  • $ nunits = 75. There are 75 total sites.
  • $ nsurveys = 3. There are 3 surveys in our data-set.
  • $ nseasons = 1. This is a single-season occupancy model, so the number of seasons is 1.
  • $ nmethods = 1. This is the number of methods used for detection. We’ll discuss it later.
  • $ det.data = a data frame with 3 variables. We’ll look at this in a minute, but it is our encounter history data that we fed into the function.
  • $ nunitcov = 0. Each site has 0 site-level covariates associated with it in the unitcov list element, described next.
  • $ unitcov = a dataframe that contains information about site-level covariates, such as elevation, hills, forest cover, etc. It’s an empty dataframe at the moment because we did not send the createPao() function any site-level covariate data.
  • $ nsurvcov = 1. There is just one survey covariate in the survcov dataframe, described next.
  • $ survcov = a dataframe with 1 variable. We’ll look at this in a minute.
  • $ nsurveysseason = 3. There are 3 surveys per season.
  • $ title = “PRESENCE Analysis”.
  • $ unitnames = These are the default site names because we did not include site names in our input data-set to createPao(). We’ll change these in a minute.
  • $ surveynames = These are the default names of the surveys. The notation 1-1 means “season 1, survey 1”; 1-2 means “season 1, survey 2”, etc.
  • $ paoname = “pres.pao”. This is a filename to save the input data for use with Presence (Windows version).
  • $ frq = a vector of 1’s. The data-set we input provides each site’s survey information on a row by row basis. Thus, the frequency associated with each site is 1.

We can use list-indexing to change list elements, if desired. Here, we’ll change the title and site names:

# change the title
trollPao$title <- "Troll of Middle Earth Single-Season Occupancy Data"

# change the site names
trollPao$unitnames <- det_hist$siteID

Let’s now inspect a few elements of this object. The list element named det.data is the encounter histories that we input to the function.

##   1200/01/10 1200/01/11 1200/01/12
## 1          0          0          0
## 2          1          1          1
## 3          0          0          0
## 4          1          1          1
## 5          0          0          0
## 6          1          0          0

The list element named survcov also stores a dataframe. This dataframe has just one column called called SURVEY, and it has 75 sites * 3 surveys = 225 rows. It indicates which survey each of the detection-history codes belongs to. It’s ordered by column, so the first 75 values of the covariate are 1, for survey 1, the next 75 are 2’s for survey 2, and the last 75 are 3’s for survey 3. The first 3 and last 3 are shown below:

head(trollPao$survcov, n = 3)
## 1      1
## 2      1
## 3      1
tail(trollPao$survcov, n = 3)
##     SURVEY
## 223      3
## 224      3
## 225      3

The summary() function will provide an overview of the pao object we created. The naive occupancy rate is given as 0.4. This is called “naive” because it is the probability of occupancy if you assume that detection was “perfect” (ie., you never miss detecting the species if it is present). It is computed as the proportion of sites that had at least one detection across the 3 surveys. This is not the true occupancy rate (unless detection is perfect, that is).

## paoname=pres.pao
## title=Troll of Middle Earth Single-Season Occupancy Data
## Naive occ=0.4
##        nunits      nsurveys      nseasons nsurveyseason      nmethods 
##          "75"           "3"           "1"           "3"           "1" 
##      nunitcov      nsurvcov 
##           "0"           "1" 
## unit covariates  :  
## survey covariates: SURVEY

See if you can match up the summary() output with items in the pao list.

To summarize so far, the createPao() function in RPresence is used to shape up your data for occupancy analysis. Importantly, almost every single analysis that you will run in RPresence begins with the createPao() function. You will save yourself many headaches if you understand how this object is structured.

7.9 occMod()

At last! Now that we have created our input pao object, we can run the single season occupancy analysis and obtain multinomial maximum likelihood estimates for \(\psi\), \(p_1\), \(p_2\), and \(p_3\). In RPresence, the main function for running occupancy analyses is the occMod() function. This is a workhorse function and we will re-visit it in many future chapters. If you haven’t guessed, our first stop is the function’s help page:


This function fits a wide variety of occupancy models using program Presence.

## function (model, cov.list = NULL, data, type = NULL, conf = 0.95, 
##     modname = NULL, paoname = NULL, outfile = NULL, fixed = NULL, 
##     initvals = NULL, modfitboot = NULL, VCoutopt = "realbetavc", 
##     noDerived = F, randinit = 0, maxfn = "32000", quiet = 1, 
##     limit.real = F, threads = 1, ...) 

There are several arguments to the occMod() function, and many have default values. Can you spot them? We can view the definitions and defaults for the parameters by using the help function (help(‘occMod’)), but here’s an overview:

  • model = list of formula for each parameter. We need a formula for \(\psi\) and a formula for \(p\) in general.
  • cov.list = list of covariates used in the model.
  • data = pao data object (object of class pao).
  • type = a string indicating the type of occupancy model to fit. In this chapter, we are fitting a single season occupancy model, which has the type of “so” (static occupancy).
  • conf = level for confidence interval (may be vector valued). The default is 0.95.
  • modname = (optional) a string with the user supplied name for a model. If NULL, modname is created from the formulas supplied in model.
  • paoname = (optional) a string with the user supplied filename for PRESENCE data and output files. If NULL, a generated name is used.
  • outfile = name for output file (use outfile=‘modname’ for outfile named via model name.)
  • fixed = a data.frame with variables, “param” and “value”. param should be a vector of string values containing the names of the real parameters (eg., “p(2)”) to fix, and value should be a numeric vector of the values the real parameter should be fixed at. For example, to fix p(2) and p(3) to zero: fixed = data.frame(param = c(‘p(2)’, ‘p(3)’), value = c(0, 0))
  • initvals = a vector providing initial values for regression coefficients for the optimization procedure used by PRESENCE.
  • modfitboot = number of bootstraps for assessing model fit. Only works for type = “so”. We will examine goodness of fit in a later chapter.
  • VCoutopt = option for PRESENCE computation and output of beta/real var-cov matrices.
  • noDerived = if TRUE, doesn’t print derived estimates from model.
  • randinit = number of random initial starting value vectors to try (default=0)
  • maxnf = max number of function calls before aborting optimization
  • quiet = 0 = normal output from Presence, 1 = no output (for simulations or repeated runs)
  • limit.real = limit output of real parameter estimates to 1st 100 sites (default = FALSE)
  • threads = number of cpu threads to use in optimization (default = 1)

Some of these arguments (such as outfile) make sense only if you remember that RPresence collects the input data, then sends the information to Presence (which came bundled with RPresence), which does the actual analysis and then sends results back to R. If you are a Presence user, the outfile may be of interest to you. If not, no worries!

We can run our single season analysis of trolls by providing just 3 arguments (using default values for the remaining arguments):

  1. We send our recently created pao object (trollsPao) to occMod()s data argument.
  2. We set the type argument to “so”, for static occupancy model.
  3. The model argument specifies the model to be run, and is sent as a list that has formula notation:
list(psi ~ 1, p ~ SURVEY)
## [[1]]
## psi ~ 1
## [[2]]
## p ~ SURVEY

The list has one list element for each parameter of a given model. For single season occupancy analysis of troll data, we provide a formula for \(\psi\) and just one formula for \(p\).

  • \(\psi\) ~ 1 provides the formula for \(\psi\). The “1” is a placeholder for “intercept”. It is called an “intercept” model because \(\psi\) is not a function of covariates (ie., \(\psi\) is the same for all sites). For example, if we wish to set \(psi\) as a function of elevation, the formula would be \(\psi\) ~ elevation. \(\psi\) ~ 1 will simply return the intercept (a single estimate) for \(\psi\) which would apply to all sites. \(\psi\) ~ elevation will return two parameters, an intercept and a term for the effect of elevation. The intercept is implied if it is not included in the formula.

  • \(p\) ~ SURVEY. For our troll data, we have 3 survey events. This is the notation we use if we wish to estimate \(p_1\), \(p_2\), and \(p_3\) uniquely. The term SURVEY should ring a bell – our pao object stores a dataframe called survcov, which contains a single column called SURVEY, which was made by default. In fact, if you read the Details section of the helpfile, SURVEY is a pre-defined covariate name reserved by occMod() (make sure to steer clear of these names for your personal use). If we wish to run a model where we force \(p_1\) = \(p_2\) = \(p_3\), then we are estimating just a single detection parameter, and our formula would be \(p\) ~ 1, or the intercept model.

Let’s run our first model:

# run occMod, and store the result as mod1
mod1 <- occMod(model = list(psi ~ 1, p ~ SURVEY), 
               data = trollPao, 
               type = "so")

# look at the structure of the mod1
str(mod1, max.level = 1)
## List of 14
##  $ modname    : chr "psi()p(SURVEY)"
##  $ model      :List of 2
##  $ dmat       :List of 2
##  $ data       :List of 15
##   ..- attr(*, "class")= chr "pao"
##  $ outfile    : chr "del_6616_6321.out"
##  $ neg2loglike: num 188
##  $ npar       : int 4
##  $ aic        : num 196
##  $ beta       :List of 5
##  $ real       :List of 2
##  $ derived    :List of 1
##  $ gof        :List of 4
##  $ warnings   :List of 2
##  $ version    :List of 2
##  - attr(*, "class")= chr [1:2] "occMod" "so"

You can see that occMod() returns an object of class “occMod” and “so”. It is a list that contains several elements, and is described in the Value section of the helpfile:

  • $ modname = “psi()p(SURVEY)”. This is the model name created by default because we did not provide the modname argument to occMod()
  • $ model = a named list containing the right-hand formula for each real parameter type. This matches our input.
  • $ dmat = the model’s design matrix. This matrix provides crucial information for solving the optimization and defining parameters, and we will look at the end of the chapter.
  • $ data = the object of class pao that was sent to the function.
  • $ outfile = “del_15864_8465.out”. This is a text file containing the output from Presence. It is automatically deleted when occMod finishes. It can be saved by specifying a name for the “outfile” argument in the occMod() function call.
  • $ neg2loglike = 188. This is the minimum -2LogLike that Presence found for our troll data. If we multiply this by -0.5, we find the model’s LogLike, which is -94. Not bad, when you consider that the saturated model’s LogLike was -93.665. This means our model adequately describes the pattern of 0’s and 1’s in the encounter history. We’ll delve much deeper into the topic of goodness of fit in a future chapter.
  • $ npar = 4. This is the number of parameters that were estimated in the model. We estimated four parameters: \(\psi\), \(p_1\), \(p_2\), and \(p_3\).
  • $ aic = 196. This is computed as -2LogLike + 2* npar, or 188 + 2 * 4 = 196. We will delve deeper into the topic of AIC and model selection in a future chapter.
  • $ beta = list of estimated regression coefficients (sometimes called beta parameters), standard errors and variance-covariance matrix for each parameter type.
  • $ real = list of real parameter estimates on their natural scale (i.e., 0-1 for probabilities). A data frame for each parameter type the contains the estimate, standard error and limits of the requested confidence intervals.
  • $ derived = list of parameter estimates that are derived from real parameters and not directly estimated.
  • $ gof = provides information about model fit, which is covered in a future chapter.
  • $ warnings = a list containing any warnings from the PRESENCE output file. We will get in the habitat of looking at this.
  • $ version = a list containing the version numbers for PRESENCE and RPresence used when fitting the model

This object is crammed full of information. It holds all of the content that was input into occMod(), and all of model output and associated information.

You can use list indexing to pull certain elements out of the resulting object. However, Darryl and Jim included some “methods” in RPresence that will aid in the most common tasks. A “method” in R is a function that works on an object of a specific class. Let’s use the methods function to return a list of methods that work on objects of class “occ”:

methods(class = class(mod1)[1])
## [1] coef    fitted  predict summary
## see '?methods' for accessing help and source code

Here, we see four methods are listed: coef(), fitted(), predict(), and summary(). If we pass our mod1 object to these any of these functions, we will be able to retrieve key components of the output. For example, let’s run the summary() function (method):

## Model name=psi()p(SURVEY)
## AIC=196.2483
## -2*log-likelihood=188.2483
## num. par=4
## Warning: Numerical convergence may not have been reached. Parameter esimates converged to approximately 7 signifcant digits.

As you can see, summary() pulls out certain elements of the mod1 list so you can quickly inspect the key results. Can you match these with the output list elements?

The warning is important: “Numerical convergence may not have been reached. Parameter estimates converged to approximately 7 significant digits.” Remember, we are trying to find combinations of parameters (betas) that minimize the -2LogLike. In the process of optimization, many calculations are performed and quantities can be very small. At some point, computer “round-off” error occurs. For example, the computer cannot represent the number 1/3 exactly, but stores it as something like, 0.3333333333333. After thousands of calculations, these round-off errors can build up and the result will be inexact. Fortunately, there is a method to determine how accurate the final -2loglikelihood value is and this warning is an indication of that. In this case, the warning is stating that the result of the optimization has 7 significant digits. If the number of significant digits is less than 3, it is an indication of an unstable likelihood function and estimates might be suspect. The maximum value returned by occMod() is 7, so no worries here. If a model does produce a low number of significant digits, there are some steps you can take to resolve it. The usual cause of the problem is a model which is too complicated for the data. The solution in that case is to try a less-complicated model, or include more data. With complicated models, the optimization routine may benefit by having better starting values for the parameters. Starting values can be specified as an argument to occMod(), or you can try a sequence of random starting values using the “randinit” argument.

Let’s try the coef() method to extract the model coefficients. Here, for the single season model, we pass in our mod1 object, and specify either ‘psi’ or ‘p’ for the param argument:

coef(object = mod1, param = 'psi')
##               est      se
## psi.Int -0.392999 0.23706
coef(object = mod1, param = 'p')
##              est       se
## p1.Int  2.125009 0.609335
## p2.Int -1.454080 0.711398
## p3.Int -0.775474 0.744918

Savvy readers will notice that these results can be located directly in the mod1 list element called “beta”, which provides not only the beta estimates and standard errors but also the variance-covariance matrices:

## $psi
##               est      se
## psi.Int -0.392999 0.23706
## $psi.VC
##          [,1]
## [1,] 0.056197
## $p
##              est       se
## p1.Int  2.125009 0.609335
## p2.Int -1.454080 0.711398
## p3.Int -0.775474 0.744918
## $p.VC
##           B1        B2        B3
## B1  0.371289 -0.363575 -0.358669
## B2 -0.363575  0.506087  0.354935
## B3 -0.358669  0.354935  0.554903
## $VC
##           A1        B1        B2        B3
## A1  0.056197 -0.004357  0.002984  0.002102
## B1 -0.004357  0.371289 -0.363575 -0.358669
## B2  0.002984 -0.363575  0.506087  0.354935
## B3  0.002102 -0.358669  0.354935  0.554903

These beta coefficients are the key outputs for a model. These are the values of the parameters that minimize the -2LogLike function. Remember, we seek the combination of parameters that maximize the LogLike value. In doing so, we are finding parameter values that best describe the patterns of 0’s and 1’s that were observed in our surveys. The value of the parameters that result in the maximum LogLike are called “Maximum Likelihood Estimates”, or “MLEs”, and we base our scientific inferences on these estimates.

\[ \text{betas} \rightarrow\text{parameters} \rightarrow \text{history probabilities} \rightarrow \text{log probabilities} \rightarrow \\ \text{product} \rightarrow \text{multinomial log likelihood}\rightarrow \text{-2LogLike}\]

Hopefully you remember that betas are unconstrained parameters, and we now need to transform them back to probabilities. The fitted() method can help us here. We pass in our mod1 object and specify which parameter (‘psi’ or ‘p’) we are interested in:

head(fitted(object = mod1, param = 'psi'), n = 10)
##                  est         se lower_0.95 upper_0.95
## psi_27530  0.4029956 0.05703407  0.2978355  0.5178994
## psi_50339  0.4029956 0.05703407  0.2978355  0.5178994
## psi_65048  0.4029956 0.05703407  0.2978355  0.5178994
## psi_79081  0.4029956 0.05703407  0.2978355  0.5178994
## psi_82884  0.4029956 0.05703407  0.2978355  0.5178994
## psi_103396 0.4029956 0.05703407  0.2978355  0.5178994
## psi_108217 0.4029956 0.05703407  0.2978355  0.5178994
## psi_114488 0.4029956 0.05703407  0.2978355  0.5178994
## psi_149175 0.4029956 0.05703407  0.2978355  0.5178994
## psi_156266 0.4029956 0.05703407  0.2978355  0.5178994

You can see that fitted() returns the back-transformed beta estimates, standard errors, and confidence intervals on a site-by-site basis. All sites have a maximum likelihood estimate for \(\psi\) = 0.4029956. Although you are viewing only the first 10 sites, all of the sites have the exact same estimates because our psi formula was \(\psi\) ~ 1. That is, all sites have the exact same estimate.

Similarly, we can look at the ‘p’ estimates:

head(fitted(object = mod1, param = 'p'), n = 10)
##                 est        se lower_0.95 upper_0.95
## p1_27530  0.8933103 0.0580739  0.7172243  0.9650845
## p1_50339  0.8933103 0.0580739  0.7172243  0.9650845
## p1_65048  0.8933103 0.0580739  0.7172243  0.9650845
## p1_79081  0.8933103 0.0580739  0.7172243  0.9650845
## p1_82884  0.8933103 0.0580739  0.7172243  0.9650845
## p1_103396 0.8933103 0.0580739  0.7172243  0.9650845
## p1_108217 0.8933103 0.0580739  0.7172243  0.9650845
## p1_114488 0.8933103 0.0580739  0.7172243  0.9650845
## p1_149175 0.8933103 0.0580739  0.7172243  0.9650845
## p1_156266 0.8933103 0.0580739  0.7172243  0.9650845

Here we see that \(p_1\) has an estimate of 0.8933103.

Because this is a single season occupancy analysis where we did not include site covariates (such as elevation or patch size) or survey covariates (such as observer or weather conditions), we can simply use the unique() function to extract one value for each parameter of interest:

unique(fitted(mod1, param = 'psi'))
##                 est         se lower_0.95 upper_0.95
## psi_27530 0.4029956 0.05703407  0.2978355  0.5178994
unique(fitted(mod1, param = 'p'))
##                est         se lower_0.95 upper_0.95
## p1_27530 0.8933103 0.05807390  0.7172243  0.9650845
## p2_27530 0.6617111 0.08676183  0.4778312  0.8069934
## p3_27530 0.7940536 0.07473524  0.6115473  0.9042400

And there is your answer. These are the single season maximum likelihood estimates for our troll data when we run a model that uniquely estimates \(\psi\), \(p_1\), \(p_2\), and \(p_3\). Let’s plot these results. First, let’s create a dataframe that can be sent to ggplot():

# extract the psi content
data <- data.frame(unique(fitted(mod1, param = 'psi')))

# add the detection rows
data <- rbind(data, unique(fitted(mod1, param = 'p')))

# add a column called Parameter
data$Parameter <- c("Psi", "p1", "p2", "p3")

# look at the data
##                 est         se lower_0.95 upper_0.95 Parameter
## psi_27530 0.4029956 0.05703407  0.2978355  0.5178994       Psi
## p1_27530  0.8933103 0.05807390  0.7172243  0.9650845        p1
## p2_27530  0.6617111 0.08676183  0.4778312  0.8069934        p2
## p3_27530  0.7940536 0.07473524  0.6115473  0.9042400        p3

Next, we’ll plot the estimates as bars, and add in their 95% confidence limits:

ggplot(data, aes(x = Parameter, y = est)) +
  labs(x = "Parameter", y = "Estimate") +
  geom_bar(stat = 'identity') +
  geom_errorbar(aes(ymin = lower_0.95, ymax = upper_0.95), width = 0.2) +