Blog: A Journey with Data Trekkers

Story of an internship by Gracielle Higino, Gabriel Dansereau and Francis Banville

Back in 2019, which feels like decades ago, we started a humble project in the Poisot lab which we called Code Hour. The goal was to set weekly hours to practice Julia, since we were all learning to use it and we could greatly benefit from each other’s help and encouragement. The project went well (although we frequently ended up spending much more than one hour). It « spilled » out of our lab and found enthusiasm at IVADO, who already had plans to promote a challenge in which participants would make a commitment to code for 100 days. That’s when our internship was born.

When we want to form a new habit, it’s a good strategy to stick to it every day, even shortly. If we want to integrate an exercise routine to our everyday life, it might be more beneficial in the long run to establish a realistic and achievable goal from the very start. For example, instead of a 5 km of running every day, a 2 km can have a significantly bigger impact. While still challenging, it has a much greater chance of being realized. This is the idea behind the #100DaysOfCode challenge, that started in 2016 as a personal improvement project. The objective of this challenge is to code for at least an hour every day, for 100 days. The idea is to stop rationalizing too much and start doing. Every little progress counts, and the public commitment (by sharing your progress through GitHub, for example) encourages you to keep going.

So we came up with this Data.trek challenge along with AEBINUM, mixing ecology, bio-informatics and machine learning (ML), in which participants would work on a project for 100 days. We thought that we, as ecologists, could both greatly benefit from and contribute to this kind of challenge. First, we deal with code every day, but we often don’t take time to just explore and learn new things – code related – in a systematic way. We know how to code for our data, our projects, but sometimes we can’t apply this knowledge in a different context. Second, ML techniques are widely applied in our research: Gabriel uses random forests models to predict species distribution based on eBird data, Francis will use neural networks to model species interactions across space, and Gracielle spent hours wrangling open biodiversity data sets. And, finally, we are used to dealing with other people’s data . Because we are constantly dealing with difficult data sets, we grew data-cleaning-and-visualization muscles that usually take a lot more years to build.

The Data.trek started on March 5th with a full day of workshops, demonstrations and talks, covering a broad range of topics from an introduction to programming, to machine learning in Julia, Python and R and its applications in environmental and biological sciences. We, as members of the Poisot lab, were especially involved in the introduction to programming and everything related to Julia and R. We spent a couple of hours every week in February to start building the lessons. One week before the event, we did a two-day sprint to finish everything and practice our demonstrations.

Building the lessons was a big plot twist! We thought it would be much simpler because we master the subjects, but it is completely different to be in the learner’s shoes all the time. We had to think on good examples, breaks, challenges, exercises, and a handful of plans B (in case everything goes wrong, or if the participants are more advanced than we thought, or if there are too many people, or if there are to little…). A whole rack of details we never thought about! Building the lessons really made us split our knowledge in little bunches and reconnect them all over again. In the end, we had a much more solid logic path to what we already knew.

We are now a few days into the 100 days journey, and we will support participants all the way through! Even if every school in Quebec is closed, even if everyone might get sick with the new deadly disease, and even if we cannot take a beer with the participants anymore, we will fight for our Data.trek and stay available to help them complete their projects and answer all their questions online. We will also conduct and share online workshops on different topics throughout the journey.

This internship really brings together what we love to do every day, which is code and share what we know, while learning a lot too. If you want to be up to date with what Data.trekkers are up to, you can follow the #datatrek, or subscribe to @poisotlab, @_AEBINUM_and @IVADO_Qc on Twitter.