Step 2_generate co-abundance pairwise data bundles

Import data

Import count matrices for each species that was generated on the High Performance Computer (HPC). Objects were saved as .RDS files because they were list(), so I will use the readRDS function to import them into R.

Generate co-abundance data bundles

Now that the data is loaded in our environment, I will use a large for-loop to generate co-abundance data bundles that will be ready to run on the HPC. Despite the loop being very long and complex, it is mainly reorganizing and sorting data rather than running calculations. Therefore, it completes in < 5 min.

Bundles will be generated for species pairs, where one species is explicitly assigned as a dominant species (predictor/independent variable), while the other is the subordinate species (response/dependent variable). Bundles are generated based on spatial associations. Each species pair must have ≥ 100 independent detections of both dominant and subordinate species from at least one shared landscape where both species were detected to ensure co-abundance model outputs are biologically meaningful. Furthermore, species pairs were excluded where the independent detection ratio between the two species exceeded 50:1 in a shared landscape to ensure our results are not skewed by sparse data in either dominant or subordinate species and that both species meaningfully contributed to the results.

We have included a new variable called community_detections across all models, which is simply a site-level variable containing the sum number of independent detections of all 44 species detected at that sampling unit except for the two-species whose co-abundance is considered in the model. The variable has been log-transformed to better normalize outliers and, like all other variables, it has been standardized with a mean of 0 and standard deviation of 1. This variable has been included to captures unmeasured variation in sampling locations, where some sampling units are better than others for a many different factors. This should manifest as more detections of more species at that sampling unit, resulting in a positive effect.

After successfully generating the bundled data, we have a total of 258 pairwise interactions composed of both top-down and bottom-up models.

Reorganize list for efficent analysis

After some preliminary testing, I realized that all co-abundance models don’t need the same amount of RAM. Most complete w/ 100 GB, but some need 250 or even 500 GB to complete. Based of prior testing, I will reorganize the list into 100, 250, or 500 GB lists. This process will be achieved by inspecting the OE (output or error) files extracted from the HPC when each job runs.

After splitting and organizing the data bundles, we have a total of 64 preferred co-abundance models and a total of 194 co-abundance models across the larger community.

Save the final list of bundled data

Now that all species pairs are established, the data is properly formatted, and everything is clean, its time to save the data! We will send it to the HPC to run on the R script called: data/HPC_code/HPC_co-abundance_model_final.R

Alternatively, I could save the list as a single object rather than splitting it up.

Generate counterfactual data bundles

After submitting the manuscript and receiving reviews from 3 expert reviewers, 2 reviewers revealed an appetite for causation that was lacking from the current analysis. Moreover, when presenting my findings at conferences and to colleagues, similar sentiments were voiced. Structural Equation Models (SEM) was a logical starting place, but the limitations of the data (i.e., lacking key variables like hunting pressure or resource availability) and inability to include a count history matrix as the response variable (and therefore propagate uncertainty) left me unsatisfied.

Instead, the new goal here is to examine counterfactual realities to understand if prey abundance increases would not have happened if not for large carnivore extirpation (top-down test), AND if predator abundance increases would not have happened if not for prey abundance increases (bottom-up test). Additionally, we can run more variations of the same co-abundance model by sub-setting the cooccuring species data into locations that match in their measured covariates to help control for unmeasured variables.

This approach builds towards the top of the ‘ladder of causation’ proposed by Judea Pearl’s Book of Why.

The first rung, simple associations from data, can be represented simply with a non-directional association between two species (e.g., correlation between RAI of two species).
The second rung, conceptual causal diagrams, can be used to visualize the researchers hypotheses, assumptions, and biases where one variable directionally affects another variable, such as predators suppress prey or prey bolster predators.
The third rung, testing counter-factual realities, forces the researcher to imagine an alternative reality where the assumed causal forces are reversed or eliminated, such as testing if prey abundance increases in a world with no large carnivore extirpation.

I will be running several counter-factual tests from the 64 preferred predator-prey models, and the 31 supported results from the broader community, to examine the causal forces driving observed relationships. The tests include:

What if prey abundance increases would not have happened if not for large carnivore extirpation (top-down test), AND what if predator abundance decreases would not have happened if not for prey abundance extirpations (bottom-up test). This isolates the effect of species extirpations rather than declines, and is tested by only selecting landscapes where species co-occur w/ sufficient detections. By thinning the data to only include sites where species co-occur, we have a total of 95 co-abundance models for this counter-factual test.
Would we observe trophic release in prey or bottom-up bolstering in predators when controlling for landscape variation that may be driving observed trends. This is tested by examining co-abundance relationships from 3 contiguous landscapes that comprise the Thai Eastern Forest Complex. By thinning the data to only include the Thai Eastern Forest Complex and maintaining regular rules for making pair-wise co-abundance models (i.e., >100 detections of both species from one landscape), we have a total of 55 co-abundance models for this counter-factual test.
Would we observe trophic release in prey and bottom-up bolstering in predators when controlling for observed elevation? This is tested by creating pairwise co-abundance bundles where we let this variable vary across its observed range: mean = 460.66, min = 6.59, max = 1588.86, and holding the other two variables around their mean, allowing for 1/2 standard deviation of wiggle room. For Forest Landscape Integrity Index, this ranges from values between 6.18 to 8.91. For Human Footprint Index, this ranges from values between 2.66 to 12.7. For the counter-factual co-abundance models examining the effect of Avg_altitude_5km, we have 94 models.
Would we observe trophic release in prey and bottom-up bolstering in predators when controlling for observed Forest Landscape Integrity Index? This is tested by creating pairwise co-abundance bundles where we let this variable vary across its observed range: mean = 7.55, min = 0, max = 10, and holding the other two variables around their mean, allowing for 1/2 standard deviation of wiggle room. For elevation, this ranges from values between 303.2 to 618.12. For Human Footprint Index, this ranges from values between 2.66 to 12.7. For the counter-factual co-abundance models examining the effect of Avg_FLLI_5km, we have 95 models.
Would we observe trophic release in prey and bottom-up bolstering in predators when controlling for observed Human Footprint Index? This is tested by creating pairwise co-abundance bundles where we let this variable vary across its observed range: mean = 7.68, min = 0, max = 97.32, and holding the other two variables around their mean, allowing for 1/2 standard deviation of wiggle room. For elevation, this ranges from values between 303.2 to 618.12. For Forest Landscape Integrity Index, this ranges from values between 6.18 to 8.91. For the counter-factual co-abundance models examining the effect of Avg_human_footprint_5km, we have 94 models.