ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.12284
  4. Cited By
The Devil is in the Detail: Simple Tricks Improve Systematic
  Generalization of Transformers
v1v2v3v4 (latest)

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
26 August 2021
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
    ViT
ArXiv (abs)PDFHTML

Papers citing "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers"

50 / 104 papers shown
Closing the Curvature Gap: Full Transformer Hessians and Their Implications for Scaling Laws
Closing the Curvature Gap: Full Transformer Hessians and Their Implications for Scaling Laws
Egor Petrov
Nikita Kiselev
Vladislav Meshkov
Andrey Grabovoy
123
0
0
19 Oct 2025
Learning neuro-symbolic convergent term rewriting systems
Learning neuro-symbolic convergent term rewriting systems
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
NAI
125
0
0
25 Jul 2025
Scaling can lead to compositional generalization
Scaling can lead to compositional generalization
Florian Redhardt
Yassir Akram
Simon Schug
GNNCoGe
204
0
0
09 Jul 2025
Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ivan Vegner
Sydelle de Souza
Valentin Forch
Martha Lewis
Leonidas A.A. Doumas
230
3
0
04 Jun 2025
Characterizing Pattern Matching and Its Limits on Compositional Task Structures
Characterizing Pattern Matching and Its Limits on Compositional Task Structures
Hoyeon Chang
Jinho Park
Hanseul Cho
Sohee Yang
Miyoung Ko
Hyeonbin Hwang
Seungpil Won
Dohaeng Lee
Youbin Ahn
Minjoon Seo
278
1
0
26 May 2025
TRACE for Tracking the Emergence of Semantic Representations in Transformers
TRACE for Tracking the Emergence of Semantic Representations in Transformers
Nura Aljaafari
Danilo S. Carvalho
André Freitas
240
1
0
23 May 2025
Comparison of Different Deep Neural Network Models in the Cultural Heritage Domain
Comparison of Different Deep Neural Network Models in the Cultural Heritage Domain
Teodor Boyadzhiev
Gabriele Lagani
Luca Ciampi
Giuseppe Amato
Krassimira Ivanova
VLM
229
1
0
30 Apr 2025
Exploring Compositional Generalization (in COGS/ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
Exploring Compositional Generalization (in COGS/ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
William Bruns
609
0
0
21 Apr 2025
Context-aware Biases for Length Extrapolation
Context-aware Biases for Length Extrapolation
Ali Veisi
Hamidreza Amirzadeh
Amir Mansourian
563
2
0
11 Mar 2025
Structural Deep Encoding for Table Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Raphael Mouravieff
Benjamin Piwowarski
Sylvain Lamprier
LMTD
277
2
0
03 Mar 2025
The Role of Sparsity for Length Generalization in Transformers
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
237
6
0
24 Feb 2025
Analyzing the Inner Workings of Transformers in Compositional Generalization
Analyzing the Inner Workings of Transformers in Compositional GeneralizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Ryoma Kumon
Hitomi Yanaka
327
1
0
24 Feb 2025
Compositional Generalization Across Distributional Shifts with Sparse
  Tree Operations
Compositional Generalization Across Distributional Shifts with Sparse Tree OperationsNeural Information Processing Systems (NeurIPS), 2024
Paul Soulos
Henry Conklin
Mattia Opper
P. Smolensky
Jianfeng Gao
Roland Fernandez
315
6
0
18 Dec 2024
Quantifying artificial intelligence through algorithmic generalization
Quantifying artificial intelligence through algorithmic generalizationNature Machine Intelligence (Nat. Mach. Intell.), 2024
Takuya Ito
Murray Campbell
L. Horesh
Tim Klinger
Parikshit Ram
ELM
442
0
0
08 Nov 2024
Overcoming classic challenges for artificial neural networks by providing incentives and practice
Overcoming classic challenges for artificial neural networks by providing incentives and practiceNature Machine Intelligence (Nat. Mach. Intell.), 2024
Kazuki Irie
Brenden M. Lake
577
8
0
14 Oct 2024
Adaptive Prediction Ensemble: Improving Out-of-Distribution
  Generalization of Motion Forecasting
Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting
Jinning Li
Jiachen Li
Sangjae Bae
David Isele
273
7
0
12 Jul 2024
Teaching Transformers Causal Reasoning through Axiomatic Training
Teaching Transformers Causal Reasoning through Axiomatic Training
Aniket Vashishtha
Abhinav Kumar
Atharva Pandey
Abbavaram Gowtham Reddy
Amit Sharma
Vineeth N. Balasubramanian
Amit Sharma
413
8
0
10 Jul 2024
Are there identifiable structural parts in the sentence embedding whole?
Are there identifiable structural parts in the sentence embedding whole?
Vivi Nastase
Paola Merlo
198
6
0
24 Jun 2024
Evaluating Structural Generalization in Neural Machine Translation
Evaluating Structural Generalization in Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Ryoma Kumon
Daiki Matsuoka
Hitomi Yanaka
NAI
199
2
0
19 Jun 2024
On the Minimal Degree Bias in Generalization on the Unseen for
  non-Boolean Functions
On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions
Denys Pushkin
Raphael Berthier
Emmanuel Abbe
206
0
0
10 Jun 2024
MoEUT: Mixture-of-Experts Universal Transformers
MoEUT: Mixture-of-Experts Universal Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
Christopher Potts
Christopher D. Manning
MoE
247
28
0
25 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
AI4CE
516
6
0
24 May 2024
Philosophy of Cognitive Science in the Age of Deep Learning
Philosophy of Cognitive Science in the Age of Deep Learning
Raphaël Millière
AI4CENAI
219
8
0
07 May 2024
What makes Models Compositional? A Theoretical View: With Supplement
What makes Models Compositional? A Theoretical View: With SupplementInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Parikshit Ram
Tim Klinger
Alexander G. Gray
CoGe
277
8
0
02 May 2024
Setting up the Data Printer with Improved English to Ukrainian Machine
  Translation
Setting up the Data Printer with Improved English to Ukrainian Machine Translation
Yurii Paniv
Dmytro Chaplynskyi
Nikita Trynus
Volodymyr Kyrylov
AI4CE
266
3
0
23 Apr 2024
Sequential Compositional Generalization in Multimodal Models
Sequential Compositional Generalization in Multimodal Models
Semih Yagcioglu
Osman Batur .Ince
Aykut Erdem
Erkut Erdem
Desmond Elliott
Deniz Yuret
195
1
0
18 Apr 2024
Enhancing Length Extrapolation in Sequential Models with
  Pointer-Augmented Neural Memory
Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory
Hung Le
D. Nguyen
Kien Do
Svetha Venkatesh
T. Tran
204
0
0
18 Apr 2024
Towards Understanding the Relationship between In-context Learning and
  Compositional Generalization
Towards Understanding the Relationship between In-context Learning and Compositional GeneralizationInternational Conference on Language Resources and Evaluation (LREC), 2024
Sungjun Han
Sebastian Padó
CoGe
210
5
0
18 Mar 2024
A Neural Rewriting System to Solve Algorithmic Problems
A Neural Rewriting System to Solve Algorithmic Problems
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
NAI
244
2
0
27 Feb 2024
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of
  Prompting Strategies
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
ELM
313
14
0
27 Feb 2024
Inducing Systematicity in Transformers by Attending to Structurally
  Quantized Embeddings
Inducing Systematicity in Transformers by Attending to Structurally Quantized EmbeddingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yichen Jiang
Xiang Zhou
Mohit Bansal
274
1
0
09 Feb 2024
Limits of Transformer Language Models on Learning to Compose Algorithms
Limits of Transformer Language Models on Learning to Compose Algorithms
Jonathan Thomm
Aleksandar Terzić
Giacomo Camposampiero
Michael Hersche
Bernhard Schölkopf
Abbas Rahimi
478
11
0
08 Feb 2024
On the generalization capacity of neural networks during generic
  multimodal reasoning
On the generalization capacity of neural networks during generic multimodal reasoningInternational Conference on Learning Representations (ICLR), 2024
Takuya Ito
Soham Dan
Mattia Rigotti
James Kozloski
Murray Campbell
LRM
229
4
0
26 Jan 2024
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable TasksInternational Conference on Machine Learning (ICML), 2023
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
325
14
0
21 Nov 2023
Attribute Diversity Determines the Systematicity Gap in VQA
Attribute Diversity Determines the Systematicity Gap in VQAConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ian Berlot-Attwell
Kumar Krishna Agrawal
A. M. Carrell
Yash Sharma
Naomi Saphra
254
2
0
15 Nov 2023
Data Factors for Better Compositional Generalization
Data Factors for Better Compositional Generalization
Xiang Zhou
Yichen Jiang
Mohit Bansal
CoGeOOD
190
7
0
08 Nov 2023
Syntax-Guided Transformers: Elevating Compositional Generalization and
  Grounding in Multimodal Environments
Syntax-Guided Transformers: Elevating Compositional Generalization and Grounding in Multimodal Environments
Danial Kamali
Parisa Kordjamshidi
210
1
0
07 Nov 2023
The Impact of Depth on Compositional Generalization in Transformer
  Language Models
The Impact of Depth on Compositional Generalization in Transformer Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jackson Petty
Sjoerd van Steenkiste
Ishita Dasgupta
Fei Sha
Daniel H Garrette
Tal Linzen
AI4CEVLM
313
30
0
30 Oct 2023
SLOG: A Structural Generalization Benchmark for Semantic Parsing
SLOG: A Structural Generalization Benchmark for Semantic ParsingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Bingzhi Li
L. Donatelli
Alexander Koller
Tal Linzen
Yuekun Yao
Najoung Kim
196
19
0
23 Oct 2023
Structural generalization in COGS: Supertagging is (almost) all you need
Structural generalization in COGS: Supertagging is (almost) all you needConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Alban Petit
Caio Corro
François Yvon
NAI
192
1
0
21 Oct 2023
Harnessing Dataset Cartography for Improved Compositional Generalization
  in Transformers
Harnessing Dataset Cartography for Improved Compositional Generalization in TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Osman Batur .Ince
Tanin Zeraati
Semih Yagcioglu
Yadollah Yaghoobzadeh
Erkut Erdem
Aykut Erdem
164
3
0
18 Oct 2023
Adaptivity and Modularity for Efficient Generalization Over Task
  Complexity
Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Samira Abnar
Omid Saremi
Laurent Dinh
Shantel Wilson
Miguel Angel Bautista
...
Vimal Thilak
Etai Littwin
Jiatao Gu
Josh Susskind
Samy Bengio
331
8
0
13 Oct 2023
Sparse Universal Transformer
Sparse Universal TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Shawn Tan
Songlin Yang
Zhenfang Chen
Aaron Courville
Chuang Gan
MoE
260
24
0
11 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
242
15
0
05 Oct 2023
Compositional Program Generation for Few-Shot Systematic Generalization
Compositional Program Generation for Few-Shot Systematic Generalization
Tim Klinger
Luke Liu
Soham Dan
A. Rezaee
Parikshit Ram
Ali Movaghar
NAI
215
4
0
28 Sep 2023
Efficient Benchmarking of Language Models
Efficient Benchmarking of Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yotam Perlitz
Elron Bandel
Ariel Gera
Ofir Arviv
L. Ein-Dor
Eyal Shnarch
Noam Slonim
Michal Shmueli-Scheuer
Leshem Choshen
ALM
515
38
0
22 Aug 2023
ExeDec: Execution Decomposition for Compositional Generalization in
  Neural Program Synthesis
ExeDec: Execution Decomposition for Compositional Generalization in Neural Program SynthesisInternational Conference on Learning Representations (ICLR), 2023
Kensen Shi
Joey Hong
Yinlin Deng
Pengcheng Yin
Manzil Zaheer
Charles Sutton
184
20
0
26 Jul 2023
A Hybrid System for Systematic Generalization in Simple Arithmetic
  Problems
A Hybrid System for Systematic Generalization in Simple Arithmetic ProblemsInternational Workshop on Neural-Symbolic Learning and Reasoning (NeSy), 2023
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
AIMatLRM
186
1
0
29 Jun 2023
Towards Robust Aspect-based Sentiment Analysis through
  Non-counterfactual Augmentations
Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations
Xinyu Liu
Yanl Ding
Kaikai An
Chunyang Xiao
Pranava Madhyastha
Tong Xiao
Jingbo Zhu
159
2
0
24 Jun 2023
Differentiable Tree Operations Promote Compositional Generalization
Differentiable Tree Operations Promote Compositional GeneralizationInternational Conference on Machine Learning (ICML), 2023
Paul Soulos
J. E. Hu
Kate McCurdy
Yunmo Chen
Roland Fernandez
P. Smolensky
Jianfeng Gao
AI4CE
137
7
0
01 Jun 2023
123
Next