Neuromodulation Gated Transformer
Kobe Knowles
Joshua Bensemann
Diana Benavides-Prado
Vithya Yogarajan
Michael Witbrock
Gillian Dobbie
Yang Chen

Abstract
We introduce a novel architecture, the Neuromodulation Gated Transformer (NGT), which is a simple implementation of neuromodulation in transformers via a multiplicative effect. We compare it to baselines and show that it results in the best average performance on the SuperGLUE benchmark validation sets.
View on arXivComments on this paper