Species-Presence exercise 6 - Two-species-single-season example

This exercise is designed to show how to run programs PRESENCE to compute species presence, detectability, and co-occurrence estimates from 'presence-absence' data which includes covariates.

Input data consists of 'detection-histories' of two individual species at potential owl territories. Sample covariates have also been included.

Running the program

Start a new project in program PRESENCE.

You're now presented with a 'Results Browser' window where a summary of each model will be saved. To run our first model:

When the 'Setup Numerical Estimation Run' window appears, a design matrix window will appear. The parameters are grouped by 'Occupancy' or 'Detection'. The Occuancy tab will contain 3 parameters: Another way of parameterizing this model is to estimate these parameters: By default, the design matrix is set up to estimate these 3 parameters independently. This would allow an interaction in occupancy of the two species (non-independent occupancy). To change the model such that occupancy of the two species is independent, simply constrain psiBA=psiBa (or fix phi=1.0).

The Detection tab will contain 5 sets of parameters (indexed by sample):

if the 2nd parameterization is chosen, the following 2 parameters would be estimated in place of the last 2 above:

Run model, "psiA(.),psiBA(.)=psiBa(.),pA(.)=rA(.),pB(.)=rBA(.)=rBa(.)"

This model is one where occupancy and detection of the two species are independent (no interaction). Since we'll be setting psiBA=psiBa, that library(fatalityCMR); ?example.search.csv means occupancy of species B is the same whether species A is present or not. To do this in PRESENCE we need to delete the last column in the Occupancy design matrix,
left-click in 1st row of column to be deleted, then right-click and select 'delete column'.
and enter a '1' in the last row, 2nd column. The design matrix should look like this:

The design matrix for detection should look like this:

Before running this model, change the model name to "psiA(.),psiBA(.)=psiBa(.),pA(.)=rA(.),pB(.)=rBA(.)=rBa(.)". Click 'OK to Run' to run this model.

After the analysis is complete, click 'yes' to append the output to the results browser. The output from this model should match the output you would get if you ran each species separately in a single-season model.

Two species parameterisations

There are currently 3 different parameterisations available for the two species model, which simply differ in how they quantify the level of any co-occurrence: the underlying modeling framework is identical in each case. In all cases we can visualise the problem of which species are present at a unit using the following Venn diagram, where psiA and psiB are the overall probabilities species A and B being present at a unit (the left and right ellipses) respectively. Questions regarding to the co-occurrence of the species essential examine the degree of overlap of the ellipses. Using the basic rules of probability we have that the two species are independent if psiA * psiB = psiAB , or alternatively psiAB/psiA = (psiB-psiAB)/(1-psiA) (i.e., Pr(species B present | species A is present) = Pr(species B present | species A is absent).

The first (original) parameterisation uses the first definition of independence to therefore calculate psaAB as

psiAB = phi*psiA*phiB
where 'phi' is the species interaction factor (SIF). Values of phi = 1 implies independence; < 1 imply the species are less likely to occur together than expected; and > 1 co-occur more often than expected. This parameterisation is reasonable for many applications without covariates, but can cause some numerical issues once covariates are introduced. The quantities that are estimated by PRESENCE in this parameterisation are logit(psiA), logit(psiB), and log(phi).

The second parameterisation uses the second definition of independence which is essentially comparing the proportion of the left ellipse that is overlapped by the right ellipse with the proportion of everything outside the left ellipse that is overlapped with the remainder of the right ellipse. Here there are two probabilities of occupancy for species B, depending on whether species A is also present (psiBA) or absent (psiBa). Under this parameterisation there is no SIF as such (although it could be derived), although if psiBA < psiBa implies avoidance; psiBA = psiBa implies independence; and psiBA > psiBa implies species co-occur more often than expected. The quantities that are estimated by PRESENCE in this parameterisation are logit(psiA), logit(psiBA), and logit(psiBa).

The final parameterisation is a combination of the first two, where we wish to estimate a SIF, although this is done in terms of an odds ratio for psiBA and psiBa (i.e., the ratio of the two proportions of overlap described above). That is, a SIF ('nu') is defined as

        psiBA/(psiA-psiBA)             psiAB/psiA-psiAB
nu = --------------------------- = --------------------------------
        psiBa/(1-psiA-psiBa)         (psiB-psiAB)/(1-psiA-psiB+psiAB)
		
The main advantages of this parameterisation is that it is more numerically stable than the first parameterisation, and also sits naturally in how one might assess questions of co-occurrence using logistic regression. Using logistic regression, one might be tempted to use the presence/absence of species A as a predictor variable (or covariate) for the presence of species B (the response variable): log(nu) is exactly analogous to the resulting logistic regression coefficient that would be obtained, although in this framework at has been corrected for imperfect detection of both species A and B. Interpretation of nu is the same as phi in the first parameterisation. The quantities that are estimated by PRESENCE in this parameterisation are logit(psiA), logit(psiBa) and log(nu).

Which parameterization to use?

The answer to this will depend on the issue you're trying to address. Both parameterizations will (usually) give the same results. Using some algebra, estimates from one parameterization can be converted to estimates in the other. For example, if the 1st parameterization is used, the psiB parameter in the 2nd parameterization can be computed as:
psiB = psiA*psiBA + (1-psiA)*psiBa
and the phi parameter can be computed as:
phi = psiA*psiB/psiAB (where psiAB=psiA*psiBA)
So, if the parameters from the 2nd parameterization can be computed using estimates from the 1st parameterization, why even bother with the 2nd parameterization? The main reason would be that you may be interested in modeling one of those parameters in the 2nd parameterization directly as a function of covariates. This cannot be done if the 1st parameterization is used.

Note about the word 'usually' above: With the first parameterization, all parameters are estimated as probabilities (range= 0 - 1). Regardless of the values taken by the parameters, (psiA, psiBA, psiBa), valid values of the parameters, (psiA, psiB, phi) will result. However, there are values of (psiA, psiB, phi) which will result in implausible values of (psiA, psiBA, psiBa). For example, if

psiA=.6  psiB=.6  phi=.278
then
psiAB=phi*psiA*psiB = 0.1
psiBA=psiAB/psiA = 0.1667
psiBa=(psiB-psiA*psiBA)/(1-psiA)  = 1.25
So, the 2nd parameterization might produce estimates which have a higher likelihood, but have parameter estimates which are out of range. PRESENCE will take steps to try to avoid this, but some data-sets may be problematic due to this.

Another parameterization

In an attempt to avoid the trouble mentioned above with using the (psiA,psiB,phi) parameterization, another parameterization was developed using an 'odds-ratio' for the SIF. The parameters for this would be (psiA, psiBa, nu), where nu is the log-odds ratio of how occupancy of species B changes with the prsence of species A.