Placeholder

Placeholder

Placeholder

Placeholder

Placeholder

# 7 Single Season Occupancy Models

## 7.1 Introduction

The last few chapters have been heavy on statistical probability, but we promise they provided important background material. We know you are itching to get back to the main problem at hand: estimating the probability that sites are occupied by a target species of interest. In this chapter, we turn to single season occupancy models, which follow the material by MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. A. Royle, and C. A. Langtimm. 2002. Estimating site occupancy when detection probabilities are less than one. Ecology 83:2248-2255. This article is on the web: http://www.uvm.edu/envnr/vtcfwru/spreadsheets/Occupancy/Occupancy3.htm and is also described in Chapter 4 of the book, Occupancy Estimation and Modeling.

## 7.2 Detection histories

For occupancy models, the goal is to determine the probability that a site is occupied by a target of interest, usually a species, given that detection is imperfect. Thus, we are interested in two things. First, is the target present at a site? What factors influence the presence or absence of a target at a site? Second, for targets that are present, is the target detected by an observer? What factors influence the detection or non-detection? In this sense, occupancy models are “hierarchical” models in that there is a process that first establishes the distribution of targets (the true presence or absence of the target), and then there is an observation process of the resulting distribution (the detection or non-detection of the target). As researchers, the data we have in hand are the latter – the detection or non-detection of target species across surveys. From that information, we try to understand the former – the true presence or absence of the target, and variables that might influence this pattern.

As an example, let’s assume you are interested in developing an occupancy model for trolls of Middle Earth. You select 75 study sites, and set out to survey each site three times in quick succession (or quick enough to assume that the site does not change in occupancy status between surveys). Or, alternatively, if the site is large enough, the site could have been sampled in three different locations in the same time period.

In any event, the species is recorded as being detected (1) or not (0) for each site based on your survey efforts. Let’s assume that on site 1 a troll was detected on the first survey, but not detected on the second or third survey. This is called a detection history or encounter history for site 1, and would be recorded as ‘1-0-0’. In the same way, the detection history for each site is determined. A history of ‘1-1-0’ means that a species of interest was detected on the first and second samples, but not on sample 3. A history of ‘0-1-0’ means that the species was detected on the second sampling occasion, but not on the first or third occasion.

How many possible histories are there? Because a particular sample occasion has only two outcomes (0 or 1), and there are 3 sampling occasions, there are $$2^3$$ = 8 kinds of histories in total. These histories are:

# all possible combinations of 0:1 for 3 surveys
htable <- expand.grid("Survey 1" = 1:0, "Survey 2" = 1:0, "Survey 3" = 1:0)
htable2Product <- "Ln(Prob) * Freq" # print the table knitr::kable(htable2, caption = "The occupancy model's log likelihood is the sum of the final column.", align = "c") %>% kable_classic(lightable_options = "basic")  Table 7.5: The occupancy model’s log likelihood is the sum of the final column. Survey 1 Survey 2 Survey 3 History Frequency Probability LnProb Product 1 1 1 111 15 psi * p1 * p2 * p3 Ln(Prob) Ln(Prob) * Freq 0 1 1 011 1 psi * (1-p1) * p2 * p3 Ln(Prob) Ln(Prob) * Freq 1 0 1 101 7 psi * p1 * (1-p2) * p3 Ln(Prob) Ln(Prob) * Freq 0 0 1 001 1 psi * (1-p1) * (1-p2) * p3 Ln(Prob) Ln(Prob) * Freq 1 1 0 110 3 psi * p1 * p2 * (1-p3) Ln(Prob) Ln(Prob) * Freq 0 1 0 010 1 psi * (1-p1) * p2 * (1-p3) Ln(Prob) Ln(Prob) * Freq 1 0 0 100 2 psi * p1 * (1-p2) * (1-p3) Ln(Prob) Ln(Prob) * Freq 0 0 0 000 45 psi * (1-p1) * (1-p2) * (1-p3) + (1-psi) Ln(Prob) Ln(Prob) * Freq The sum of the encounter histories must equal 1 (since we’ve accounted for every possible combination). ## 7.6 Occupancy MLEs Once we have written out the encounter history probabilities in terms of the 4 occupancy parameters, our multinomial log likelihood function $log(\mathcal{L}(p_i | n, y_i) \propto y_{\color{blue}{111}} log(\color{brown}{p_{111}}) +\\ y_{\color{blue}{011}} log(\color{brown}{p_{011}}) + \\y_{\color{blue}{101}} log(\color{brown}{p_{101}}) +...+ \\y_{\color{blue}{000}} log(\color{brown}{p_{000}}) \tag{7.4}$ becomes: $log(\mathcal{L}(p_i | n, y_i) \propto y_{\color{blue}{111}} log(\color{brown}{\psi*p_1*p_2*p_3}) +\\ y_{\color{blue}{011}} log(\color{brown}{\psi(1-p_1) * p_2 * p_3}) + \\y_{\color{blue}{101}} log(\color{brown}{\psi * p_1*(1-p_2)*p_3}) +... +\\ y_{\color{blue}{000}} log(\color{brown}{\psi*(1-p_1)*(1-p_2)*(1-p_3) + (1-\psi)}) \tag{7.5}$ Our goal is to find the combination of values of $$\psi, p_1, p_2, p_3$$ which yield the maximum value of the log-likelihood function listed above. One method is to try every possible combination of values for the parameters, compute the log-likelihood value, then choose the combination of parameters which gave the maximum value. For example, let’s compute the log likelihood for $$\psi$$ = 0.7, $$p_1$$ = 0.8, $$p_2$$ = 0.9, and $$p_3$$ = 1: $log(\mathcal{L}(p_i | n, y_i) \propto y_{\color{blue}{111}} log(\color{brown}{0.7*0.8*0.9*1}) + \\y_{\color{blue}{011}} log(\color{brown}{0.7*(1-0.8) * 0.9 * 1}) + \\y_{\color{blue}{101}} log(\color{brown}{0.7 * 0.8*(1-0.9)*1}) +... + \\y_{\color{blue}{000}} log(\color{brown}{0.7*(1-.08)*(1-0.9)*(1-1) + (1-0.7)}) \tag{7.5}$ If you did the math, you would find the result. The log likelihood could be similarly computed for other combinations too. We input a certain combination of parameters, which in turn affect the history probability, which in turn affect the natural log probabilities, which in turn affect the product of the log probabilities times the history frequency, which in turn affects the final LogLike value: $\text{parameters} \rightarrow \text{history probabilities} \rightarrow \text{log probabilities} \rightarrow \\ \text{product} \rightarrow \text{multinomial log likelihood}$ We seek the combination of parameters that maximize the LogLike value. In doing so, we are finding parameter values that best describe the patterns of 0’s and 1’s that were observed in our surveys. The value of the parameters that result in the maximum LogLike are called “Maximum Likelihood Estimates”, or “MLEs”, and we base our scientific inferences on these estimates. In Chapter 5 we mentioned how computers excel at finding the maximum of a function. They also excel at finding the minimum of a function. The package RPresence approaches the problem by minimizing the -2LogLike function instead of maximizing the LogLike function (the MLE’s are identical). In Chapters 4 and 5, we mentioned that optimization routines often operate on inputs that are unconstrained, and this is true in the package RPresence as well. RPresence does not work on the constrained occupancy parameters $$\psi$$, $$p_1$$, $$p_2$$, and $$p_3$$, which are all probabilities bound between 0 and 1. Instead, it works on unconstrained beta parameters ($$\beta$$s.) Thus, RPresence finds the betas that minimize the -2LogLike. $\text{betas} \rightarrow\text{parameters} \rightarrow \text{history probabilities} \rightarrow \text{log probabilities} \rightarrow \\ \text{product} \rightarrow \text{multinomial log likelihood}\rightarrow \text{-2LogLike}$ Well, you may ask, what are the MLE’s for the troll data? It’s finally time to run your first analysis in the program RPresence. ## 7.7 RPresence Hopefully RPresence is installed on your machine, so we simply need to load it into R so that its functions are available to us: library(RPresence) The troll data that we have been exploring are included in RPresence directly as a CSV (comma separated variables) file. We can read it in with the following code: # read in the data trolls <- read.csv(system.file("extdata/trolls.csv", package = "RPresence"), header = TRUE) # peek at the data head(trolls) ## siteID ValianDate region detection city hill river forest wetland hill15 ## 1 1085441 1200/01/10 Arda 0 0 Y 0.00 0.0 0.000 0.443 ## 2 1085441 1200/01/11 Arda 1 0 Y 0.00 0.0 0.000 0.443 ## 3 1085441 1200/01/12 Arda 1 0 Y 0.00 0.0 0.000 0.443 ## 4 1085441 1200/01/13 Arda 0 0 Y 0.00 0.0 0.000 0.443 ## 5 1085441 1200/01/14 Arda 0 0 Y 0.00 0.0 0.000 0.443 ## 6 985373 1200/01/10 Arda 0 0 N 0.03 0.1 0.007 0.083 ## stonefields northing random easting observer ## 1 0 576318.6 0.838 1735759 Legolas ## 2 0 576318.6 0.838 1735759 Legolas ## 3 0 576318.6 0.838 1735759 Gollum ## 4 0 576318.6 0.838 1735759 Gollum ## 5 0 576318.6 0.838 1735759 Gollum ## 6 0 634318.6 0.047 673759 Aragorn The head() function shows us first 6 rows of the troll data-set. Each site in Middle Earth has a unique ID, such as “1085441”. The data-set shows the detection or non-detection of trolls for each site on a given date by a given observer. For instance, the first record was surveyed on 1200/01/10 by Legolas, who not detect any trolls (“detection” = 0). Legolas surveyed this same site the next day, 1200/01/11, where he detected a troll, as did Gollum on 1200/01/12. Perhaps Gollum spotted the same troll as Legolas, but perhaps not. The data only convey the detection or non-detection of a target. The data-set also contains conditions associated with each site. For example, the column “city” indicates whether a site is within 30km of any Middle Earth city. There are other columns as well that describe site characteristics. We won’t be dealing with these columns in this chapter, but will revisit them in Chapter 8 where we will allow site occupancy to be a function of such variables. ## 7.8 createPao() The basic input for nearly every analysis in RPresence is an R object of class “pao”, which stands for “presence-absence object”. A “pao” object is created with the createPao() function. The output of this function is a large R object that contains all of the information needed to run an analysis in RPresence. It is well worth your time to look at this function’s help page: help(createPao) The function’s description tells us that it “Creates a pao data object from R variables”. The function has several arguments, and many are populated with default values as indicated by the equal sign. args(createPao) ## function (data, unitcov = NULL, survcov = NULL, nsurveyseason = ncol(data), ## nmethods = 1, frq = rep(1, nrow(data)), title = "PRESENCE Analysis", ## unitnames = paste0("unit", 1:nrow(data)), paoname = "pres.pao") ## NULL Let’s explore the first argument, which is the only argument that does not have a default value: • data = data frame containing the detection (=1) and non-detection (=0) data with observations from a sampling unit along a row, and surveys in the columns. Missing values are allowed (=NA). Let’s stop there and shape our troll data to a format that createPao() needs. Below, we use the spread() function in the package, tidyr to turn our troll data-set from “long” format to “wide” format, where dates are spread across columns and sites are listed down rows. Thus, “ValianDate” is the “key” argument to the spread() function, and “detection” is the “value” argument to the spread() function. Have a look: # assign encounter histories to each site det_hist <- tidyr::spread(data = trolls[,c('ValianDate','siteID','detection')], key = ValianDate, value = detection) head(det_hist, n = 10) ## siteID 1200/01/10 1200/01/11 1200/01/12 1200/01/13 1200/01/14 ## 1 27530 0 0 0 0 0 ## 2 50339 1 1 1 0 1 ## 3 65048 0 0 0 0 0 ## 4 79081 1 1 1 1 1 ## 5 82884 0 0 0 0 0 ## 6 103396 1 0 0 1 0 ## 7 108217 0 0 0 0 0 ## 8 114488 1 1 1 1 1 ## 9 149175 0 0 0 0 0 ## 10 156266 1 0 1 1 1 You can confirm that each site is listed by row, and detections for that site are provided by the different columns. Notice now that the format of the data differs from the summarized frequencies shown in many of the introductory tables. This particular data-set gives results of 5 surveys. For our purposes, we will only include the first three surveys to match up with our introductory prose. Columns 2, 3, and 4 are sufficient to input to the createPao() function’s data argument. At this point, we have enough information to use the createPao() function and create an R object of class “pao”: # trim the dataset to just three surveys data <- det_hist[-c(1, 5, 6)] # send the data to the createPao function trollPao <- createPao(data = data) # look at the structure of the returned object; note the class is "pao" str(trollPao, max.level = 1) ## List of 15 ## nunits       : int 75
##  $nsurveys : int 3 ##$ nseasons     : int 1
##  $nmethods : num 1 ##$ det.data     :'data.frame':   75 obs. of  3 variables:
##  $nunitcov : int 0 ##$ unitcov      :'data.frame':   0 obs. of  0 variables
##  $nsurvcov : int 1 ##$ survcov      :'data.frame':   225 obs. of  1 variable:
##  $nsurveyseason: int 3 ##$ title        : chr "PRESENCE Analysis"
##  $unitnames : chr [1:75] "unit1" "unit2" "unit3" "unit4" ... ##$ surveynames  : chr [1:3] "1-1" "1-2" "1-3"
##  $paoname : chr "pres.pao" ##$ frq          : num [1:75] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "class")= chr "pao"

As you can see, an R object of class “pao” is a list with 15 elements. Many of these elements were filled in with default values used by the createPao() function. Let’s work our way through this list:

• $nunits = 75. There are 75 total sites. •$ nsurveys = 3. There are 3 surveys in our data-set.
• $nseasons = 1. This is a single-season occupancy model, so the number of seasons is 1. •$ nmethods = 1. This is the number of methods used for detection. We’ll discuss it later.
• $det.data = a data frame with 3 variables. We’ll look at this in a minute, but it is our encounter history data that we fed into the function. •$ nunitcov = 0. Each site has 0 site-level covariates associated with it in the unitcov list element, described next.
• $unitcov = a dataframe that contains information about site-level covariates, such as elevation, hills, forest cover, etc. It’s an empty dataframe at the moment because we did not send the createPao() function any site-level covariate data. •$ nsurvcov = 1. There is just one survey covariate in the survcov dataframe, described next.
• $survcov = a dataframe with 1 variable. We’ll look at this in a minute. •$ nsurveysseason = 3. There are 3 surveys per season.
• $title = “PRESENCE Analysis”. •$ unitnames = These are the default site names because we did not include site names in our input data-set to createPao(). We’ll change these in a minute.
• $surveynames = These are the default names of the surveys. The notation 1-1 means “season 1, survey 1”; 1-2 means “season 1, survey 2”, etc. •$ paoname = “pres.pao”. This is a filename to save the input data for use with Presence (Windows version).
• $frq = a vector of 1’s. The data-set we input provides each site’s survey information on a row by row basis. Thus, the frequency associated with each site is 1. We can use list-indexing to change list elements, if desired. Here, we’ll change the title and site names: # change the title trollPao$title <- "Troll of Middle Earth Single-Season Occupancy Data"

# change the site names
trollPao$unitnames <- det_hist$siteID

Let’s now inspect a few elements of this object. The list element named det.data is the encounter histories that we input to the function.

head(trollPao$det.data) ## 1200/01/10 1200/01/11 1200/01/12 ## 1 0 0 0 ## 2 1 1 1 ## 3 0 0 0 ## 4 1 1 1 ## 5 0 0 0 ## 6 1 0 0 The list element named survcov also stores a dataframe. This dataframe has just one column called called SURVEY, and it has 75 sites * 3 surveys = 225 rows. It indicates which survey each of the detection-history codes belongs to. It’s ordered by column, so the first 75 values of the covariate are 1, for survey 1, the next 75 are 2’s for survey 2, and the last 75 are 3’s for survey 3. The first 3 and last 3 are shown below: head(trollPao$survcov, n = 3)
##   SURVEY
## 1      1
## 2      1
## 3      1
tail(trollPao$survcov, n = 3) ## SURVEY ## 223 3 ## 224 3 ## 225 3 The summary() function will provide an overview of the pao object we created. The naive occupancy rate is given as 0.4. This is called “naive” because it is the probability of occupancy if you assume that detection was “perfect” (ie., you never miss detecting the species if it is present). It is computed as the proportion of sites that had at least one detection across the 3 surveys. This is not the true occupancy rate (unless detection is perfect, that is). summary(trollPao) ## paoname=pres.pao ## title=Troll of Middle Earth Single-Season Occupancy Data ## Naive occ=0.4 ## nunits nsurveys nseasons nsurveyseason nmethods ## "75" "3" "1" "3" "1" ## nunitcov nsurvcov ## "0" "1" ## unit covariates : ## survey covariates: SURVEY See if you can match up the summary() output with items in the pao list. To summarize so far, the createPao() function in RPresence is used to shape up your data for occupancy analysis. Importantly, almost every single analysis that you will run in RPresence begins with the createPao() function. You will save yourself many headaches if you understand how this object is structured. ## 7.9 occMod() At last! Now that we have created our input pao object, we can run the single season occupancy analysis and obtain multinomial maximum likelihood estimates for $$\psi$$, $$p_1$$, $$p_2$$, and $$p_3$$. In RPresence, the main function for running occupancy analyses is the occMod() function. This is a workhorse function and we will re-visit it in many future chapters. If you haven’t guessed, our first stop is the function’s help page: help(occMod) This function fits a wide variety of occupancy models using program Presence. args(occMod) ## function (model, cov.list = NULL, data, type = NULL, conf = 0.95, ## modname = NULL, paoname = NULL, outfile = NULL, fixed = NULL, ## initvals = NULL, modfitboot = NULL, VCoutopt = "realbetavc", ## noDerived = F, randinit = 0, maxfn = "32000", quiet = 1, ## limit.real = F, threads = 1, ...) ## NULL There are several arguments to the occMod() function, and many have default values. Can you spot them? We can view the definitions and defaults for the parameters by using the help function (help(‘occMod’)), but here’s an overview: • model = list of formula for each parameter. We need a formula for $$\psi$$ and a formula for $$p$$ in general. • cov.list = list of covariates used in the model. • data = pao data object (object of class pao). • type = a string indicating the type of occupancy model to fit. In this chapter, we are fitting a single season occupancy model, which has the type of “so” (static occupancy). • conf = level for confidence interval (may be vector valued). The default is 0.95. • modname = (optional) a string with the user supplied name for a model. If NULL, modname is created from the formulas supplied in model. • paoname = (optional) a string with the user supplied filename for PRESENCE data and output files. If NULL, a generated name is used. • outfile = name for output file (use outfile=‘modname’ for outfile named via model name.) • fixed = a data.frame with variables, “param” and “value”. param should be a vector of string values containing the names of the real parameters (eg., “p(2)”) to fix, and value should be a numeric vector of the values the real parameter should be fixed at. For example, to fix p(2) and p(3) to zero: fixed = data.frame(param = c(‘p(2)’, ‘p(3)’), value = c(0, 0)) • initvals = a vector providing initial values for regression coefficients for the optimization procedure used by PRESENCE. • modfitboot = number of bootstraps for assessing model fit. Only works for type = “so”. We will examine goodness of fit in a later chapter. • VCoutopt = option for PRESENCE computation and output of beta/real var-cov matrices. • noDerived = if TRUE, doesn’t print derived estimates from model. • randinit = number of random initial starting value vectors to try (default=0) • maxnf = max number of function calls before aborting optimization • quiet = 0 = normal output from Presence, 1 = no output (for simulations or repeated runs) • limit.real = limit output of real parameter estimates to 1st 100 sites (default = FALSE) • threads = number of cpu threads to use in optimization (default = 1) Some of these arguments (such as outfile) make sense only if you remember that RPresence collects the input data, then sends the information to Presence (which came bundled with RPresence), which does the actual analysis and then sends results back to R. If you are a Presence user, the outfile may be of interest to you. If not, no worries! We can run our single season analysis of trolls by providing just 3 arguments (using default values for the remaining arguments): 1. We send our recently created pao object (trollsPao) to occMod()s data argument. 2. We set the type argument to “so”, for static occupancy model. 3. The model argument specifies the model to be run, and is sent as a list that has formula notation: list(psi ~ 1, p ~ SURVEY) ## [[1]] ## psi ~ 1 ## ## [[2]] ## p ~ SURVEY The list has one list element for each parameter of a given model. For single season occupancy analysis of troll data, we provide a formula for $$\psi$$ and just one formula for $$p$$. • $$\psi$$ ~ 1 provides the formula for $$\psi$$. The “1” is a placeholder for “intercept”. It is called an “intercept” model because $$\psi$$ is not a function of covariates (ie., $$\psi$$ is the same for all sites). For example, if we wish to set $$psi$$ as a function of elevation, the formula would be $$\psi$$ ~ elevation. $$\psi$$ ~ 1 will simply return the intercept (a single estimate) for $$\psi$$ which would apply to all sites. $$\psi$$ ~ elevation will return two parameters, an intercept and a term for the effect of elevation. The intercept is implied if it is not included in the formula. • $$p$$ ~ SURVEY. For our troll data, we have 3 survey events. This is the notation we use if we wish to estimate $$p_1$$, $$p_2$$, and $$p_3$$ uniquely. The term SURVEY should ring a bell – our pao object stores a dataframe called survcov, which contains a single column called SURVEY, which was made by default. In fact, if you read the Details section of the helpfile, SURVEY is a pre-defined covariate name reserved by occMod() (make sure to steer clear of these names for your personal use). If we wish to run a model where we force $$p_1$$ = $$p_2$$ = $$p_3$$, then we are estimating just a single detection parameter, and our formula would be $$p$$ ~ 1, or the intercept model. Let’s run our first model: # run occMod, and store the result as mod1 mod1 <- occMod(model = list(psi ~ 1, p ~ SURVEY), data = trollPao, type = "so") # look at the structure of the mod1 str(mod1, max.level = 1) ## List of 14 ##$ modname    : chr "psi()p(SURVEY)"
##  $model :List of 2 ##$ dmat       :List of 2
##  $data :List of 15 ## ..- attr(*, "class")= chr "pao" ##$ outfile    : chr "del_6616_6321.out"
##  $neg2loglike: num 188 ##$ npar       : int 4
##  $aic : num 196 ##$ beta       :List of 5
##  $real :List of 2 ##$ derived    :List of 1
##  $gof :List of 4 ##$ warnings   :List of 2
##  $version :List of 2 ## - attr(*, "class")= chr [1:2] "occMod" "so" You can see that occMod() returns an object of class “occMod” and “so”. It is a list that contains several elements, and is described in the Value section of the helpfile: •$ modname = “psi()p(SURVEY)”. This is the model name created by default because we did not provide the modname argument to occMod()
• $model = a named list containing the right-hand formula for each real parameter type. This matches our input. •$ dmat = the model’s design matrix. This matrix provides crucial information for solving the optimization and defining parameters, and we will look at the end of the chapter.
• $data = the object of class pao that was sent to the function. •$ outfile = “del_15864_8465.out”. This is a text file containing the output from Presence. It is automatically deleted when occMod finishes. It can be saved by specifying a name for the “outfile” argument in the occMod() function call.
• $neg2loglike = 188. This is the minimum -2LogLike that Presence found for our troll data. If we multiply this by -0.5, we find the model’s LogLike, which is -94. Not bad, when you consider that the saturated model’s LogLike was -93.665. This means our model adequately describes the pattern of 0’s and 1’s in the encounter history. We’ll delve much deeper into the topic of goodness of fit in a future chapter. •$ npar = 4. This is the number of parameters that were estimated in the model. We estimated four parameters: $$\psi$$, $$p_1$$, $$p_2$$, and $$p_3$$.
• $aic = 196. This is computed as -2LogLike + 2* npar, or 188 + 2 * 4 = 196. We will delve deeper into the topic of AIC and model selection in a future chapter. •$ beta = list of estimated regression coefficients (sometimes called beta parameters), standard errors and variance-covariance matrix for each parameter type.
• $real = list of real parameter estimates on their natural scale (i.e., 0-1 for probabilities). A data frame for each parameter type the contains the estimate, standard error and limits of the requested confidence intervals. •$ derived = list of parameter estimates that are derived from real parameters and not directly estimated.
• $gof = provides information about model fit, which is covered in a future chapter. •$ warnings = a list containing any warnings from the PRESENCE output file. We will get in the habitat of looking at this.
• $version = a list containing the version numbers for PRESENCE and RPresence used when fitting the model This object is crammed full of information. It holds all of the content that was input into occMod(), and all of model output and associated information. You can use list indexing to pull certain elements out of the resulting object. However, Darryl and Jim included some “methods” in RPresence that will aid in the most common tasks. A “method” in R is a function that works on an object of a specific class. Let’s use the methods function to return a list of methods that work on objects of class “occ”: methods(class = class(mod1)[1]) ## [1] coef fitted predict summary ## see '?methods' for accessing help and source code Here, we see four methods are listed: coef(), fitted(), predict(), and summary(). If we pass our mod1 object to these any of these functions, we will be able to retrieve key components of the output. For example, let’s run the summary() function (method): summary(mod1) ## Model name=psi()p(SURVEY) ## AIC=196.2483 ## -2*log-likelihood=188.2483 ## num. par=4 ## Warning: Numerical convergence may not have been reached. Parameter esimates converged to approximately 7 signifcant digits. As you can see, summary() pulls out certain elements of the mod1 list so you can quickly inspect the key results. Can you match these with the output list elements? The warning is important: “Numerical convergence may not have been reached. Parameter estimates converged to approximately 7 significant digits.” Remember, we are trying to find combinations of parameters (betas) that minimize the -2LogLike. In the process of optimization, many calculations are performed and quantities can be very small. At some point, computer “round-off” error occurs. For example, the computer cannot represent the number 1/3 exactly, but stores it as something like, 0.3333333333333. After thousands of calculations, these round-off errors can build up and the result will be inexact. Fortunately, there is a method to determine how accurate the final -2loglikelihood value is and this warning is an indication of that. In this case, the warning is stating that the result of the optimization has 7 significant digits. If the number of significant digits is less than 3, it is an indication of an unstable likelihood function and estimates might be suspect. The maximum value returned by occMod() is 7, so no worries here. If a model does produce a low number of significant digits, there are some steps you can take to resolve it. The usual cause of the problem is a model which is too complicated for the data. The solution in that case is to try a less-complicated model, or include more data. With complicated models, the optimization routine may benefit by having better starting values for the parameters. Starting values can be specified as an argument to occMod(), or you can try a sequence of random starting values using the “randinit” argument. Let’s try the coef() method to extract the model coefficients. Here, for the single season model, we pass in our mod1 object, and specify either ‘psi’ or ‘p’ for the param argument: coef(object = mod1, param = 'psi') ## est se ## psi.Int -0.392999 0.23706 coef(object = mod1, param = 'p') ## est se ## p1.Int 2.125009 0.609335 ## p2.Int -1.454080 0.711398 ## p3.Int -0.775474 0.744918 Savvy readers will notice that these results can be located directly in the mod1 list element called “beta”, which provides not only the beta estimates and standard errors but also the variance-covariance matrices: mod1$beta
## $psi ## est se ## psi.Int -0.392999 0.23706 ## ##$psi.VC
##          [,1]
## [1,] 0.056197
##
## $p ## est se ## p1.Int 2.125009 0.609335 ## p2.Int -1.454080 0.711398 ## p3.Int -0.775474 0.744918 ## ##$p.VC
##           B1        B2        B3
## B1  0.371289 -0.363575 -0.358669
## B2 -0.363575  0.506087  0.354935
## B3 -0.358669  0.354935  0.554903
##
## $VC ## A1 B1 B2 B3 ## A1 0.056197 -0.004357 0.002984 0.002102 ## B1 -0.004357 0.371289 -0.363575 -0.358669 ## B2 0.002984 -0.363575 0.506087 0.354935 ## B3 0.002102 -0.358669 0.354935 0.554903 These beta coefficients are the key outputs for a model. These are the values of the parameters that minimize the -2LogLike function. Remember, we seek the combination of parameters that maximize the LogLike value. In doing so, we are finding parameter values that best describe the patterns of 0’s and 1’s that were observed in our surveys. The value of the parameters that result in the maximum LogLike are called “Maximum Likelihood Estimates”, or “MLEs”, and we base our scientific inferences on these estimates. $\text{betas} \rightarrow\text{parameters} \rightarrow \text{history probabilities} \rightarrow \text{log probabilities} \rightarrow \\ \text{product} \rightarrow \text{multinomial log likelihood}\rightarrow \text{-2LogLike}$ Hopefully you remember that betas are unconstrained parameters, and we now need to transform them back to probabilities. The fitted() method can help us here. We pass in our mod1 object and specify which parameter (‘psi’ or ‘p’) we are interested in: head(fitted(object = mod1, param = 'psi'), n = 10) ## est se lower_0.95 upper_0.95 ## psi_27530 0.4029956 0.05703407 0.2978355 0.5178994 ## psi_50339 0.4029956 0.05703407 0.2978355 0.5178994 ## psi_65048 0.4029956 0.05703407 0.2978355 0.5178994 ## psi_79081 0.4029956 0.05703407 0.2978355 0.5178994 ## psi_82884 0.4029956 0.05703407 0.2978355 0.5178994 ## psi_103396 0.4029956 0.05703407 0.2978355 0.5178994 ## psi_108217 0.4029956 0.05703407 0.2978355 0.5178994 ## psi_114488 0.4029956 0.05703407 0.2978355 0.5178994 ## psi_149175 0.4029956 0.05703407 0.2978355 0.5178994 ## psi_156266 0.4029956 0.05703407 0.2978355 0.5178994 You can see that fitted() returns the back-transformed beta estimates, standard errors, and confidence intervals on a site-by-site basis. All sites have a maximum likelihood estimate for $$\psi$$ = 0.4029956. Although you are viewing only the first 10 sites, all of the sites have the exact same estimates because our psi formula was $$\psi$$ ~ 1. That is, all sites have the exact same estimate. Similarly, we can look at the ‘p’ estimates: head(fitted(object = mod1, param = 'p'), n = 10) ## est se lower_0.95 upper_0.95 ## p1_27530 0.8933103 0.0580739 0.7172243 0.9650845 ## p1_50339 0.8933103 0.0580739 0.7172243 0.9650845 ## p1_65048 0.8933103 0.0580739 0.7172243 0.9650845 ## p1_79081 0.8933103 0.0580739 0.7172243 0.9650845 ## p1_82884 0.8933103 0.0580739 0.7172243 0.9650845 ## p1_103396 0.8933103 0.0580739 0.7172243 0.9650845 ## p1_108217 0.8933103 0.0580739 0.7172243 0.9650845 ## p1_114488 0.8933103 0.0580739 0.7172243 0.9650845 ## p1_149175 0.8933103 0.0580739 0.7172243 0.9650845 ## p1_156266 0.8933103 0.0580739 0.7172243 0.9650845 Here we see that $$p_1$$ has an estimate of 0.8933103. Because this is a single season occupancy analysis where we did not include site covariates (such as elevation or patch size) or survey covariates (such as observer or weather conditions), we can simply use the unique() function to extract one value for each parameter of interest: unique(fitted(mod1, param = 'psi')) ## est se lower_0.95 upper_0.95 ## psi_27530 0.4029956 0.05703407 0.2978355 0.5178994 unique(fitted(mod1, param = 'p')) ## est se lower_0.95 upper_0.95 ## p1_27530 0.8933103 0.05807390 0.7172243 0.9650845 ## p2_27530 0.6617111 0.08676183 0.4778312 0.8069934 ## p3_27530 0.7940536 0.07473524 0.6115473 0.9042400 And there is your answer. These are the single season maximum likelihood estimates for our troll data when we run a model that uniquely estimates $$\psi$$, $$p_1$$, $$p_2$$, and $$p_3$$. Let’s plot these results. First, let’s create a dataframe that can be sent to ggplot(): # extract the psi content data <- data.frame(unique(fitted(mod1, param = 'psi'))) # add the detection rows data <- rbind(data, unique(fitted(mod1, param = 'p'))) # add a column called Parameter data$Parameter <- c("Psi", "p1", "p2", "p3")

# look at the data
data
##                 est         se lower_0.95 upper_0.95 Parameter
## psi_27530 0.4029956 0.05703407  0.2978355  0.5178994       Psi
## p1_27530  0.8933103 0.05807390  0.7172243  0.9650845        p1
## p2_27530  0.6617111 0.08676183  0.4778312  0.8069934        p2
## p3_27530  0.7940536 0.07473524  0.6115473  0.9042400        p3

Next, we’ll plot the estimates as bars, and add in their 95% confidence limits:

library(ggplot2)
ggplot(data, aes(x = Parameter, y = est)) +
labs(x = "Parameter", y = "Estimate") +
geom_bar(stat = 'identity') +
geom_errorbar(aes(ymin = lower_0.95, ymax = upper_0.95), width = 0.2) +
theme_bw()