Mechanistic vs. Functional Modeling in Cognitive Neuroscience: Bridging Divides Through Deep Learning

academicmemories
Jul 31
6 min read

Written by: Davis Hobley

Behavior is rooted in complex neural circuitry. From face-selective regions like the fusiform face area to the dopaminergic reward system, neuroscience has mapped out key components of the brain that are implicated in perceiving, interpreting, and interacting with others (Adolphs, 2009). However, modeling these systems accurately has remained a challenge in neuroscience for decades. Deep learning may represent a promising path towards mechanistic models of neuroscience (Yamins & DiCarlo, 2016). Diverging from functional models, mechanistic models seek to accurately resemble the inner workings of the brain—reproducing the output, and reaching it in a similar manner. An analog and digital clock will both read 11:15 at the same time; however, the ways in which they get to that answer are drastically different. This makes the digital clock a functional model of the analog clock, as they both produce the same output, but not a mechanistic model, as they get to that output in very different ways.

What exactly is a mechanistic model of cognitive neuroscience? It may be easier to start with what makes a model non-mechanistic. For example, when we think about a Large Language Model (LLM), such as ChatGPT, it is generally agreed that the model may be functional, e.g. produce coherent thoughts and assemble information; however, the model does not mechanistically model how language is learned or communicated in humans (Sejnowski, 2023). This is a key distinction in the application of deep learning to neuroscience. So how does a model learn? How did ChatGPT reach a functional proficiency to model human language without mirroring human development?

LLMs, generally speaking, reached the state of proficiency they are at today through three key scalable factors: model size, model architecture, and training data (Kaplan, et al. 2020). Model size refers to the number of parameters that a model has. GPT-3 has 175 billion parameters, and GPT-4 has 1 trillion (Brown, et al. 2020). Each parameter can be thought of as a variable in an equation that makes up the model. Each parameter has a value (often between -5 and 5) that represents some level of activation (Goodfellow, et al. 2016). Through the lens of neuroscience, we can think of a parameter as the objective synaptic strength between neurons in the brain. For example, neuron A may be connected to neurons B, C, and D. When neuron A is firing, perhaps neuron B fires 40% of the time, C 60% of the time, and D 5% of the time. This building of relationships between activation of neurons in the brain is very similar to how parameters define the relationship between neurons in a model (Richards, et al. 2019). Model architecture pertains to the overall structure and design of a model. For example, in OpenAI’s model CLIP (Contrastive Learning-Image Pretraining), the architecture contains an image encoder and a text encoder. The idea of this model was to allow CLIP to learn how to classify images with text descriptions (Radford, et al. 2021). This would mean that CLIP would learn what a dog looked like via images and how to label it as a dog through text. Specifically, this means the model was performing out-of-distribution classification: taking images it had never seen before, and based on generalizations of the training data, labeling them with a description. The image encoder sought to turn images into long strings of numbers, known as image embeddings. The text encoder did a similar process with text, creating a string to represent the label of an image. Together, the image embedding would represent the image of a dog while the text embedding would represent the word “Dog” (Illharco, et al. 2021). In neuroscience, model architecture can be thought of as neural circuits that connect different regions of the brain. An example includes the arcuate fasciculus, which connects Broca’s area (speech production) and Wernicke’s area (language comprehension) (Catani & Mesulam, 2008). Lastly, and most importantly for the scope of this paper, training data is paramount.

Training data is more often than not, the divergence between mechanistic and functional models for deep-learning applied neuroscience (Goncalves, et al. 2020). When we think about a model like ChatGPT, it does contain a system which resembles the architecture of the brain, carrying millions of nodes analogous to neurons and over a trillion parameters. However, the way in which ChatGPT was trained is far from the reality of human learning. ChatGPT was trained on essentially the whole internet’s data, including textbooks, Reddit posts, and blogs. This is not how a human learns (OpenAI, 2023). We do not start our journey of language learning by being exposed to every digital file accessible to us. The process is much slower, starting with simple readings, social interactions, and play, to gradually scaffold language as we age (Kuhl, 2004). Even with circuitry that may resemble the brain, the divergence in the training data makes ChatGPT a functional model, not a mechanistic one.

That being said, the creation of mechanistic models is well underway. One model that is close to mechanistic reliability is the VG-W2V2 model. This model is spectacular in quite a few ways, demonstrating that advanced Deep Learning models can be trained mechanistically. In VG-W2V2, the model is fed data from 123,000 crowd-sourced images with read-aloud image descriptions. Similar to CLIP, the goal is to get the model to generalize image descriptions over time, allowing it to analyze an image of a kitchen it has never seen before and label it as a kitchen (Khorrami, 2024). Based on what I’ve described so far, this should raise an important question: Isn’t exposure to 123,000 images extremely unrealistic for a child to be exposed to within a compact time? Yes, and that is exactly what the author thought while creating the model. So, the model was only exposed to 6 hours of spoken language a day, similar to how much a child would be exposed to. Note that the model had no prior understanding of language when it was built: it learned to encode language from waveform audio signals along with its training, an impressive feat that has been done before, but rarely within a model also learning how to label images with a data set of descriptions (contrastive model) (Baevski, 2020). This meant that, with the same amount of exposure to image, text, and audio, an infant may have, the model learned language, learned to associate text and image descriptions and was able to generalize these findings to out-of-distribution cases. The model represents a strong approach towards a mechanistic model for human language development.

Figure 1

VG-W2V2 training data, including audio classification of a crowd-sourced image set

Note: The audios are waveforms, the model was not programmed with an understanding of language; rather, it learned language as an emergent property as the model was trained approaching mechanistic human language development (Khorrami, 2024).

While impressive, this may leave you wondering what the purpose of all this is. Why should we attempt to make models mechanistic to human development and cognitive neuroscience? There are two key reasons: these models borrow from principles of neuroscience that have been built by evolution over the course of millions of years, potentially allowing these models to gain useful abilities rapidly. Additionally, if we can create a model that is both fully functional and mechanistic, experimentation may be done on this model as a sort of “model species” for humans, drastically changing the landscape of research.

References

Adolphs, R. (2009). The social brain: Neural basis of social knowledge. Annual Review of Psychology, 60, 693–716. https://doi.org/10.1146/annurev.psych.60.110707.163514

Baevski, A. Zhou, Y. Mohamed, A. & Auli, M. (2020). wav2vec 2.0: A framework for self‑supervised learning of speech representations. Advances in Neural Information Processing Systems, 33. https://arxiv.org/abs/2006.11477

Brown, T. B. Mann, B. Ryder, N. Subbiah, M. Kaplan, J. D. Dhariwal, P. ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://arxiv.org/abs/2005.14165

Catani, M. & Mesulam, M. M. (2008). The arcuate fasciculus and the disconnection theme in language and aphasia: History and current state. Cortex, 44(8), 953–961. https://doi.org/10.1016/j.cortex.2008.04.002

Goodfellow, I. Bengio, Y. & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org

Gonçalves, P. J. Lueckmann, J.-M. Deistler, M. Nonnenmacher, M. Öcal, K. Bassetto, G. Chintaluri, C. Podlaski, W. F. Haddad, S. A. Vogels, T. P. Greenberg, D. S. & Macke, J. H. (2020). Training deep neural density estimators to identify mechanistic models of neural dynamics. eLife, 9, e56261. https://doi.org/10.7554/eLife.56261

Ilharco, G. Wortsman, M. Wightman, R. Gordon, C. Carlini, N. Taori, R. ... & Zettlemoyer, L. (2021). OpenCLIP: An open-source reproduction of CLIP. https://github.com/mlfoundations/open_clip

Kaplan, J. McCandlish, S. Henighan, T. Brown, T. B. Chess, B. Child, R. ... & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. https://arxiv.org/abs/2001.08361

Khorrami, K. & Räsänen, O. (2024). A model of early word acquisition based on realistic-scale audiovisual naming events. Speech Communication, 167, 103169. https://doi.org/10.1016/j.specom.2024.103169

Kuhl, P. K. (2004). Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience, 5(11), 831–843. https://doi.org/10.1038/nrn1533

OpenAI. (2023). GPT-4 Technical Report. https://cdn.openai.com/papers/gpt-4.pdf

Radford, A. Kim, J. W. Hallacy, C. Ramesh, A. Goh, G. Agarwal, S. ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020. https://arxiv.org/abs/2103.00020

Richards, B. A. Lillicrap, T. P. Beaudoin, P. Bengio, Y. Bogacz, R. Christensen, A. ... & Kording, K. P. (2019). A deep learning framework for neuroscience. Nature Neuroscience, 22(11), 1761–1770. https://doi.org/10.1038/s41593-019-0520-2

Sejnowski, T. J. (2023). Large language models and the reverse Turing test. Neural Computation, 35(4), 567–580. https://doi.org/10.1162/neco_a_01604

Yamins, D. L. K. & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356–365. https://doi.org/10.1038/nn.4244

Mechanistic vs. Functional Modeling in Cognitive Neuroscience: Bridging Divides Through Deep Learning

Recent Posts

Comments