Chicken-or-egg: Which came first, the question or the model?

The questions we ask

The quest for generality is at the core of ecology. Once we observe some relationship between X and Y, we usually want to know how general that relationship is, and how much the relationship can vary in different conditions – whether it’s ecosystems, species, individuals, sites, or other groupings. In other words, we are interested in separating the variation we observe into (1) a general relationship between X and Y that explains what we see most of the time, and (2) conditions under which this general relationship may differ. These conditions can be ecological (i.e. maybe species differ in the way they respond to X), but they can also be due to the process of observing the relationships (i.e. maybe X responds to Y in site A as expected and B but not C and D, for reasons we haven’t measured).

As an example, we know that plants generally grow faster when they are exposed to more light. If we go out and measure plant growth rates in sites A, B, C, and D, we may see that this general relationship holds. But, some species may actually grow better in low light, and their relationship with light will differ. 

When we monitor biodiversity change, we are often asking a similar question. How are species’ abundances generally responding to global change: are they declining? stable? Growing? Once we know this general response, we also need to know how individual species are doing: which species are declining the fastest, and need immediate conservation interventions? Which ones are stable but not growing, but should continue being monitored in case this changes? Which ones are growing, and may be doing better after conservation or may not need any conservation action for the time being?

So, our question is generally framed in this way: what is the general relationship, and when and how much can this relationship differ between species?

How we answer these questions

To answer these questions, we build models, collect data, and evaluate how well our model explains the data. For simplicity, let’s explore an example where we want to understand Y ~ X for many species. One approach is to model Y ~ X for each species and then try to put all these models together to get some kind of average or agreement between them. Maybe we could look at how the slope of our models vary across these species, to make inferences about how the strength of the relationship between X and Y differs between species.  

Another approach is to build one model that disentangles the Y ~ X relationship into a general component, which summarises the general trajectory for all species, and a species-level component, which estimates how the relationship of Y ~ X is conditional on species identity. This is, essentially, a hierarchical model: we are estimating variation (1) across species, and (2) between species in one model.

Which comes first? The chicken or the egg?

Over the last decade, we’ve seen a burst in modeling capacity in ecology. We have big datasets, more powerful computers, a wide catalogue of statistical modeling tools and literature to back it up, and importantly – a ton of questions we want to answer.

These questions should, of course, always be based on our previous understanding of relationships we’ve studied between Y and X from the literature and our observations of nature. Our questions are supposed to shape the model we build – but, we have to admit that sometimes the model shapes the question first. Maybe our labs use certain models that we’ve adopted ourselves, and we find ways to apply this model to new datasets. Maybe our field loves using a certain approach (like community ecologists love a good PCA – no shame here, I love a PCA too), so we reach for it first. Maybe we think a newly published modeling approach is fancy and cool, and want to find a way to use it in our research. These are all fine, honestly. 

But we are wondering: Are our research questions sometimes limited by our model “comfort zone”, and if so, how do we push past this?

In other words, like the chicken and the egg: Which came first, the question or the model?

Are there questions that we aren’t yet asking because we don’t know how to answer them? Are some of these questions actually possible to answer now that we have an abundance of models, types and sizes of datasets, and powerful computers?

Join a community call!

If you’re interested in talking more about these models, please join our community calls during March and April 2025! We welcome anyone interested in GAMs, computational ecology, or eager to learn more about HGAMs to participate in the following sessions:

These community calls are intended to help us face this question. Each discussion will focus on the outstanding ecological questions that we could answer with HGAMs, highlighting a wide array of potential applications for specific types of ecological and evolutionary data. Join us in thinking about how we could use HGAMs to push ecological research forward! You do not need any background with hierarchical modeling or generalized additive models to join these discussions.

Each discussion will follow this structure:

ActivityDuration
Welcome & Scope10 mins
Individual reflection before the small group discussion5 mins
Small group discussion: What question do you usually ask? What question would you like to ask next?What model(s) do you use to answer your questions? Are some of your questions (or questions you’d like to ask) something you could ask with a hierarchical model, or hierarchical GAM?15 mins
Whole group discussion (discuss the small group findings)25 min
Wrap up & next steps5 mins
Close

What is this all for?

Our intention is to collaboratively write a Perspective paper to highlight the outstanding questions in ecology that could be explored with hierarchical GAMs. 

Please fill this form to let us know how you would like to participate (or not) in the next steps. 

All participants of this call who contributed to the notes and discussions will be credited in the acknowledgements of the paper unless otherwise communicated to us. 

Authored by Camille Lévesque and Katherine Hébert

HGAMs working group community calls

We invite you to join our series of discussions on the applications of Hierarchical Generalized Additive Models (HGAMs) in ecology. These discussions are part of a broader initiative led by BIOS2’s HGAMs working group, aimed at promoting the understanding and application of these models. 

We welcome anyone interested in GAMs, computational ecology, or eager to learn more about HGAMs to participate in the following sessions:

Each discussion will focus on the outstanding ecological questions that we could answer with HGAMs, highlighting a wide array of potential applications for specific types of ecological and evolutionary data. Join us in thinking about how we could use HGAMs to push ecological research forward!

Hierarchical Generalized Additive Models

On March 3rd, 2025, BIOS² will host a new training on hierarchical generalized additive models (HGAMs) by fellows Camille Lévesque and Katherine Hébert.

This course is designed to demystify hierarchical modelling as powerful tools to model population dynamics, spatial distributions, and any non-linear relationships in your ecological data. The training will be divided into two blocks. First, we will cover hierarchies in biology, data, and in models to understand what hierarchical models are, some of the forms they can take, and the fundamentals of how they work. Second, we will introduce latent variable modelling as a way to explain even more of the variation in our response variables, to better disentangle the hierarchies of variation in our data. Both blocks will include a theoretical presentation followed by hands-on coding exercises to implement and interpret hierarchical GAMs.

This training will be given in English, and the coding exercises will be done in R. We recommend installing R and RStudio prior to the workshop, and will send more detailed instructions about packages to install and data to download in the days before the workshop.

We recommend previous experience with GAMs before taking this training. If you would like to follow an introduction to GAMs before this workshop, please have a look at Eric Pedersen’s Introduction to GAMs (https://bios2.usherbrooke.ca/2021/10/20/workshop-gams-2021/), the Québec Centre for Biodiversity Science’s Workshop 8: GAMs (http://r.qcbs.ca/workshop08/book-en/) or take this BIOS²+QCBS hybrid training that will happen on February 28th, 2025.

What: Short course on hierarchical generalized additive models (3h)
When: March 3, 1pm-4pm EST (includes 15 minute break)
Where: Online on Zoom
Registration: https://us02web.zoom.us/meeting/register/_JZMQEXhRaqxVvCKUqM4jA

Workshop R and Git: from code to collaboration

BIOS2 is holding a one-day workshop on programming and version control for data science and research. Our newly Certified Carpentries Instructors Francis Banville and Gabriel Dansereau will be giving an introduction to the Tidyverse, a collection of R packages for data manipulation and visualisation. They will also introduce Git, a collaboration and version control tool useful for research work. The lessons covered will be inspired by those developed by The Carpentries (Data Analysis and Visualization in R for Ecologists and Version Control with Git).

The workshop is free and will take place in person at the Campus MIL of University of Montreal (with snacks!) and the language of instruction will be French. We have a very limited number of places, so register early!

You must have R and RStudio and Git Bash installed before the workshop. If you need help with installation, please email Francis or Gabriel, or arrive 60-30 minutes before the workshop starts.

What: Workshop on R (Tidyverse) and Git
When: Saturday 18 January from 9:00 to 17:00 ET
Where: Campus MIL (Université de Montréal), room B-2061
Contacts: Francis Banville (francis.banville [at] umontreal.ca) and Gabriel Dansereau (gabriel.dansereau [at] umontreal.ca)

Registration form:

Dealing with spatial data in R – workshop

The first workshop of the 2024-2025 calendar will be about spatial data analysis, and registrations are now open! Perfect for researchers and data analysts with a foundational understanding of R, this workshop will cover Google Colab, reproducible workflows, biodiversity and spatial data sets available online, shapefile and raster operations, and Google Earth Engine (GEE) integration. You should have access to Google Suite applications like Google Drive and Google Colab via an email account to participate. The workshop instructors are Lionel Leston (University of Alberta) and Mobina Gholamhosseini (Université de Montréal).

Registration Details

Date: September 17th, 19th, 24th and 26th, 2024
Time: 10 am PT / 1 pm PT
Venue: Online (Zoom and Google Colab)
Registrations: https://us02web.zoom.us/meeting/register/tZUkdOmsqT4vGNTLsbPoZIezkRH2x6Vhc8zJ

Registrations are free and open to everyone, but seats are limited. If you have any questions, don’t hesitate to get in touch with us at pgm_bios2 [at] usherbrooke.ca.

Open Reviewers Workshop

In July, BIOS2 will host a 2-hour workshop on open peer review, hosted by our fellow Allegra Spensieri. Allegra has been selected as a PREreview Champion and is now bringing to our community the principles of a fair and open peer review, with the support of PREreview.

PREreview is an organization that develops infrastructure to support the pre-print peer review process from start to end. They develop and provide training on how to write helpful reviews for preprints, and have built a platform to submit preprint reviews on, as well as a community of researchers. The open reviewers workshop is a training program that PREreview has designed for researchers at all levels to learn how to write equitable peer reviews.

This 2-hour workshop will focus on the basics of open, preprint peer reviewing and becoming aware of biases in peer review. Registrations are free and open to everyone, but seats are limited.

When: July 10th, 2024 – 1pm EST
Where: Online
Registrations: https://us02web.zoom.us/meeting/register/tZIoce6tqz0vGtdpZPVtuBKMpLyxjR9WvqjN