Step 1 Prepare data to make count history matrix

Introduction

This code will prepare camera trap data to create count history matrices and the associated covariates. This is a necessary first step towards preparing the data to implement Zachary Amir’s co-abundance models on the High-Performance Computers (HPC).

This code works off the spatially re-sampled captures and covariates that are generated in the 4-step Southeast Asian camera trap data standardization pipeline. To learn more about this Southeast Asian camera trap data standardization pipeline, please contact Zachary Amir or Matthew Luskin to request access to the relevant GitHub repository.

Camera trap data included for analysis

The camera trap data used in this analysis comes from 13 data providers that sampled 45 spatially distinct landscapes from a total of 282 temporally distinct survey_ids that have been defined to be a maximum of 100 days long. This data is comprised of 10503 single-season camera deployments. The single-season surveys ranged in duration from 13 to 107 days, with an average of 85.77 and SD of 23.92 days. In terms of trap-nights (i.e., number of deployments multiplied by duration), the average sampling effort per survey had 100 to 6237 trap-nights, with an average of 1247.89 and SD of 1139.98 trap-nights.

We have spatially re-sampled the camera trap captures and covariates to the 5km spatial scale, meaning that any cameras within a 5km hexagon had their captures aggregated. There are a total of 4062 5km sampling units available for analysis, and each landscape contains between 3 to 723 sampling units, with an average of 90.27 and SD of 127.08 per landscape. But remember, sampling units are temporal, so the same location can be repeated multiple times thru time. Therefore, to assess sizes, its better to inspect the number of 5km polygons sampled. Each landscapes contains between 3 to 168 polygons, with an average of 31.64 and SD of 37.6 per landscape.

As part of ensuring we are using robust data, a landscape needs be at least 9 km2 to provide spatial variation and a survey needs to contain at least 100 trap-nights. Therefore, a total of 68 surveys have been removed that do not meet these standards, and this represents 5.4% of the captures. For the remaining 282 valid surveys, the median number of 5km sampling units is 12 with a range from 3 - 63, and the median sampling effort per survey is 891 trap-nights, with a range from 100 - 6237.

Determine which species to analyze

In these code chunks, I will determine which species have sufficient captures (n = 100), standardize similar species to the genera level (e.g., clouded leopards in the genus Neofelis), and remove non-relevant ‘species’ (e.g. blanks).

I will also gather species traits from the library(speciestraits) databases. Each species will have 1) a trophic level (i.e., carnivore, omnivore, herbivore), 2) body mass (in grams), and 3) home range size (in km2). Finally, I have also added dietary preferences for the four large carnivores species that are the focus of this analysis.

Before any standardization there are a total of 106 ‘species’ with 100 captures ready for analysis, but after removing domestic species (e.g., Canis lupus familiaris & Bos taurus), primarily arboreal species (e.g., Pongo pygmaeus & Presbytis rubicunda), species less than 1kg or greater than 1000 kg, or species lacking any trait data, we are left with a total of 44 species ready for analysis. Therefore, the total number of independent captures for these species is 133694.

A total of 6 trophic guilds were constructed based on trophic level and body mass. Unfortunately, home-range data was lacking for 32 species, but for the known home-range values. For the remaining values, the median value is 3.515 with a range of 0.46 - 64.89 square kilometers.

There are a total of 15 herbivores, split between 11 small herbivores and 4 large herbivores using a 100 kg cut-off.

There are a total of 17 omnivores, split between 13 small omnivores and 4 large omnivores using a 50 kg cut-off.

There are a total of 12 carnivores, split between 8 small carnivores and 4 large carnivores using a 15 kg cut-off.

Tigers (Panthera tigris) were generated 1838 independent detections from 7 landscapes. They were deemed to prefer any other species that is not a large carnivore and above 17 kg. This includes a total of 9 species composed of Helarctos_malayanus, Rusa_unicolor, Sus_barbatus, Sus_scrofa, Tapirus_indicus, Ursus_thibetanus, Bos_gaurus, Capricornis_genus, Muntiacus_genus.

Leopards (Panthera pardus) were generated 230 independent detections from 4 landscapes. They were deemed to prefer any other species that is not a large carnivore and with a weight range of 10-40 kg. Also, all primates were included as preferred prey. This includes a total of 7 species composed of Arctictis_binturong, Arctonyx_collaris, Macaca_arctoides, Macaca_fascicularis, Macaca_nemestrina, Sus_scrofa, Muntiacus_genus.

Dholes (Cuon alpinus) were generated 795 independent detections from 9 landscapes. They were deemed to prefer any other species that is not a large carnivore and with a weight range of 40-190 kg. Also, all bear species were removed as preferred prey and muntjac deer were included. This includes a total of 5 species composed of Rusa_unicolor, Sus_barbatus, Sus_scrofa, Capricornis_genus, Muntiacus_genus.

Clouded leopards (Neofelis genus) were generated 646 independent detections from 24 landscapes. They were deemed to prefer any other species that is not a large carnivore and with a weight range of 7-190 kg. Also, all bear species and golden cats were removed as preferred prey. This includes a total of 12 species composed of Arctictis_binturong, Arctonyx_collaris, Hystrix_brachyura, Macaca_arctoides, Macaca_nemestrina, Rusa_unicolor, Sus_barbatus, Sus_scrofa, Viverra_tangalunga, Viverra_zibetha, Capricornis_genus, Muntiacus_genus.

Prepare data to be saved

Now that relevant species have been determined and species traits are sorted, I will finalize preparing the data by generating a sampling occasion index to create count history matrices for each species using the formatted captures and metadata files. Instead of creating a detection history matrix based on the date a photo was taken, I make all cameras start on the same ‘day’ (based around start/end dates) to increase model speed and efficiency.

By adding the observation sequence to the captures, we lost 5% of the captures. This is an acceptable amount.

I have also determined which species need to be analyzed as groups instead of inviduals because of very large group sizes cause N-mixture models to crash and create unrealistic estimates. This includes 9 species: Bos_gaurus, Cuon_alpinus, Macaca_arctoides, Macaca_fascicularis, Macaca_nemestrina, Muntiacus_genus, Rusa_unicolor, Sus_barbatus, Sus_scrofa.

The key spatial variables that will get included in each species’ N-mixture model to account for habitat filtering are: Avg_FLLI_5km, Avg_altitude_5km, Avg_human_footprint_5km. The correlation coefficient between forest integrity and human footprint is -0.564. The correlation coefficient between elevation and human footprint is -0.366. The correlation coefficient between forest integrity and elevation is 0.432.

Save data

Generating the count histories and covariates for each species on a regular computer takes hours, so instead I will send the information to the High Performance Computers (HPC) to run much faster. The data is saved in my personal Dropbox folder, but will move to GitHub eventually.

HPC matrix creation code

The next section of analysis relies on the HPC, where we convert our covariates and metadata into a count history matrix with the proper associated observation- and site-level covariates. We use the HPC to iterate the process per species with more computational power which vastly speeds up the process.

The separate R script is called: scripts/HPC_code/HPC_matrix_generator_SEA_TC.R.

To make that code run on the HPC, I use the SLURM script called: scripts/SLURM_code/SLURM_generate_matricies.txt.