Computing biology or biology computing?

Text written by William Ou, PhD candidate at UBC and BIOS² Fellow since 2021.

Just before the start of the spring term, my supervisor asked if I would be interested in giving a guest lecture on computational ecology at her Ecological Methodology class. She knew I was interested in teaching and that simulation-based research methods are becoming popular but lacking in the course syllabus, so she figured it would be a great opportunity for everyone.

As an aspiring science communicator, I jumped on this opportunity immediately. However, my eagerness to take on this invitation also in part, comes from a place of spite. Spite because I never really considered myself a computational ecologist in the traditional sense (I will expand on this later). Since I suspect that most people probably don’t know what else computation in computational ecology could be, other than a tool for studying ecology, I thought I could take on this opportunity to offer a new perspective. Inspired by the engagement I received from the class, I thought I’d expand the bandwidth of the material I shared in the form of a blogpost!

Unlike most modern-day computational biologists, I don’t run shell scripts, build any cutting-edge statistical models, train neural networks, have a HUGO website, write in LaTeX, or do any stuff that requires HPC (although I would love to at some point!). Exactly this reason, I’ve been hesitant to introduce, or even consider, myself as a computational biologist. As a BIOS2 fellow, you can imagine how much anxiety I get from feeling like an impostor! But the thing is, although I don’t use or think I’m an expert in any of these computational tools, I do think deeply about them on a conceptual level. In a sense, I use computation as a conceptual tool. As an example, It might sound ridiculous to think about the “concept” of HUGO websites, but it’s really not that absurd. Okay, maybe HUGO websites specifically is a bit ridiculous, but the concept of websites is actually extremely fascinating.

What makes a CV hosted on a website different from one on a piece of paper? Assuming the contents in the CVs are completely identical, then we might say that there is no difference. At least from the contents of the CV itself. In the business, the content in the CV is what’s called syntactic information and is what Shannon entropy measures (aka the Shannon diversity that we ecologists all love). But if you’re a nit-picker, you might say*, “Hold on, one has text represented by ink while the other by LED lights!” And you would be absolutely correct! But this is precisely the beauty of Alan Turing’s proof that computation is substrate-independent, that syntactic information can be represented (or computed) in any form of media. In the words of Max Tegmark, “matter doesn’t matter”. This profound insight of substrate-independence is exactly what got me interested in the universality of computation, computation beyond what goes on in silicon chips, and in particular, computation in living systems.

Optimization

Before I discuss computation as a conceptual tool, perhaps it’s best to start at the heart of computers: optimization. At least how I see it, computers were built so that we can delegate tasks, ones that are often repetitive, tedious, and mindless, to machines so that we can focus our attention on more urgent matters like watching Netflix. Instead of organizing all your receipts and doing arithmetics to derive summary statistics, you can now automate this whole process by having computer programs read your e-receipts (or scan a QR code) and calculate statistics in a split second. Accounting has now been optimized.

Now, swap “accounting” with an ecological problem and, at least in principle, you can optimize it with computers. In many instances, this problem is masked under the name of least squares, used when we fit statistical models to our data. By finding the parameter values that correspond to the least squares error, the optimal solution is found. The utility function in economics, fitness function in evolution, or more generally, the cost/loss function in ML lingo are all of the same sort. These functions essentially serve as the objective criterion that evaluates how well a particular solution solves a given problem. While some of the solutions to these functions have analytical solutions, many don’t or are hard to find! This is where numerical approximations or iteratively trying out possible solutions using computers can become really handy. As a trivial example, try finding the line of best fit by manually adjusting the slope and intercept terms: Numerical Approximation. Much like how you are iteratively trying out specific combinations, we can write algorithms that tell computers to do exactly that for us.

Data-driven modeling

The example above shows how we might use computers to find the optimal parameter values of a pre-specified model that best fits our data. But how do we know that this a priori model (i.e. linear model) is the best we can do? Instead of finding parameter values, the process of building a model itself, can too be delegated to a computer! This idea is at the heart of data-driven modeling. Instead of having computers sieve through just numbers, we can have it sieve through mathematical operators too and let the algorithm evolve organically, finding operators and parameter values that correspond to an optimal, and hopefully sensible (!), solution. This type of approach becomes indispensable especially when we want to consider the physical constraints of the real world (e.g. gravity, conservation of mass, etc.) to help us understand the causal mechanisms that generated the data we observe. As an example to put it more concretely, although both Kepler and Newton’s laws (i.e. models) predict the orbits of planets in the solar system well (i.e. data fits well), the explanatory framework behind the predictions are different. In particular, the relationship between momentum and energy in Newton’s model allowed it to be extrapolated beyond planets and describe the motion of any objects with mass. In ecology, this is equivalent to fitting sine waves to predator-prey cycling data instead of coupled differential equations; both can fit the data well, but one of them contains biologically meaningful parameters.

The beauty of the data-driven modeling approach is that it requires minimal assumptions about the phenomena we are studying and allows us to explore the space of possible explanations. In ecology, species are constantly evolving, interacting with themselves, other species, and the environment. This type of data-driven approach becomes even more indispensable because where do we even begin to write down an equation that encompassses all this complexity? Moreover, the complexity of ecological systems often give rise to chaotic dynamics and what I believe is at the heart of our science (and the discussion section of every paper): context-dependency. To get a grasp of the high-dimensional systems and its context-dependencies, several authors have suggested that perhaps there is no single unified uber model in ecology, and that we should just embrace it’s context-dependency and take a data-driven approach where we constantly collect data and update our models (Ye et al 2015; Dietze et al 2018). In fact, there is an ecologist that is suggesting we throw away equations altogether.

Biology computing

I’m guessing that you might’ve noticed, the data-driven modeling paradigm described above sounds rather Bayesian, doesn’t it? More precisely, it works under the assumption that there is no single, globally best model (or that it’s unattainable). Instead, it takes an iterative approach where current models are constantly being updated or re-written in light of new data or evidence. While this iterative approach is foundational to the computational tools we use, as I might’ve hinted throughout this post, this iterative algorithm isn’t unique to just silicon-based computers. Science itself has been an iterative process before modern-day computers came to existence. In fact, the word computers used to refer to an occupation of predominantly women, who by the way, were instrumental to the discovery of chaos theory (namely, Ellen Fetter and Margaret Hamilton)! While human computers compute to make sense of the world, use the knowledge gained to build roads, launch satellites, and extract resources from the deep sea, it’s hard not to wonder: How do other biological organisms compute to make sense of their world? And how does that impact their ecology and evolution? This is what I refer to as Biology computing.

One of the most exciting research directions that I think is transcending many scientific disciplines is framing life as information-processing agents. The ability to conduct computations or process information has been suggested as a characteristic feature of life. This perspective takes the position that all organisms, whether single-cell or multicellular, non-neural or neural, engage in some form of cognition; sensing stimuli and responding accordingly. By processing information, organisms gain the capacity to not just passively react to the external environment but actively shape and select it. My favourite example of this is the slime mould Physarum spp. In a clever experiment, Saigusa et al (2008) showed that after exposing slime moulds to periodic cycles of favourable and unfavourable conditions, they were able to adjust their behavior preemptively as if they anticipated! In order to achieve such a feat, slime moulds must possess sensory capabilities and some form of information storage system (i.e. memory), allowing them to integrate observations across time and construct “mental models” that predict its future. With these predictions, they can make informed survival decisions that are, or look as if, intentional. To me, this is a clear demonstration of how organisms can learn patterns about their environment to make informed decisions, like moving away from a location before conditions become inhospitable.

It’s really easy to get caught up in the details of our computational tools that we forget where they even came from in the first place. Beginning with observations, sensory proteins were arguably the first form of matter in the universe that were capable of sensing changes or making observations about its environment. Analysis on the other hand, although seems particularly anthropocentric, is also everywhere in biology. As an obvious example, neural networks, as its name suggests, were inspired from the analytical/cognitive architecture of neurons in animals. Furthermore, evolution by natural selection, a process unique to life, is itself an optimization algorithm which forms the basis of a class of algorithms known as Evolutionary Algorithms.

Recognizing these parallels and the importance of information-processing in living systems, many ecologists are now exploring the algorithms that organisms utilize to survive in noisy environments and what its consequences are for its ecology and evolution (e.g. Hein et al 2016; Bernhardt et al 2020; Little et al 2022). Instead of bringing computation into biology, some are even bringing biology into computers, where silicon-chips now become the experimental test tubes that digital organisms grow, interact, and evolve in (e.g. Fortuna et al 2013).

Despite being “computational”, scholars working in this area of research are asking fundamentally different questions than the conventional computational biologists that use computational tools to make sense of data. Given what I’ve discussed, do you think this kind of computational biologist warrants a new category for us to classify them? The next time someone tells you that they are a Computational Biologist, be sure to clarify whether they are in fact a Computational Biologist or Biological Computationist 😉

Photo by Ray Hennessy on Unsplash