v1v2v3v4 (latest)

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

26 August 2021

Papers citing "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers"

50 / 104 papers shown

Closing the Curvature Gap: Full Transformer Hessians and Their Implications for Scaling Laws

123

19 Oct 2025

Learning neuro-symbolic convergent term rewriting systems

125

25 Jul 2025

Scaling can lead to compositional generalization

204

09 Jul 2025

Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

230

04 Jun 2025

Characterizing Pattern Matching and Its Limits on Compositional Task Structures

278

26 May 2025

TRACE for Tracking the Emergence of Semantic Representations in Transformers

Nura Aljaafari

Danilo S. Carvalho

André Freitas

240

23 May 2025

Comparison of Different Deep Neural Network Models in the Cultural Heritage Domain

229

30 Apr 2025

Exploring Compositional Generalization (in COGS/ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)

William Bruns

609

21 Apr 2025

Context-aware Biases for Length Extrapolation

Ali Veisi

Hamidreza Amirzadeh

Amir Mansourian

563

11 Mar 2025

Structural Deep Encoding for Table Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

277

03 Mar 2025

The Role of Sparsity for Length Generalization in Transformers

237

24 Feb 2025

Analyzing the Inner Workings of Transformers in Compositional GeneralizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

Ryoma Kumon

Hitomi Yanaka

327

24 Feb 2025

Compositional Generalization Across Distributional Shifts with Sparse Tree OperationsNeural Information Processing Systems (NeurIPS), 2024

315

18 Dec 2024

Quantifying artificial intelligence through algorithmic generalizationNature Machine Intelligence (Nat. Mach. Intell.), 2024

442

08 Nov 2024

Overcoming classic challenges for artificial neural networks by providing incentives and practiceNature Machine Intelligence (Nat. Mach. Intell.), 2024

Kazuki Irie

Brenden M. Lake

577

14 Oct 2024

Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting

273

12 Jul 2024

Teaching Transformers Causal Reasoning through Axiomatic Training

Aniket Vashishtha

Abhinav Kumar

Atharva Pandey

Abbavaram Gowtham Reddy

Amit Sharma

Vineeth N. Balasubramanian

Amit Sharma

413

10 Jul 2024

Are there identifiable structural parts in the sentence embedding whole?

Vivi Nastase

Paola Merlo

198

24 Jun 2024

Evaluating Structural Generalization in Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

199

19 Jun 2024

On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions

Denys Pushkin

Raphael Berthier

Emmanuel Abbe

206

10 Jun 2024

MoEUT: Mixture-of-Experts Universal Transformers

Christopher D. Manning

MoE

247

25 May 2024

From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks

516

24 May 2024

Philosophy of Cognitive Science in the Age of Deep Learning

Raphaël Millière

AI4CE NAI

219

07 May 2024

What makes Models Compositional? A Theoretical View: With SupplementInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

277

02 May 2024

Setting up the Data Printer with Improved English to Ukrainian Machine Translation

266

23 Apr 2024

Sequential Compositional Generalization in Multimodal Models

195

18 Apr 2024

Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory

204

18 Apr 2024

Towards Understanding the Relationship between In-context Learning and Compositional GeneralizationInternational Conference on Language Resources and Evaluation (LREC), 2024

Sungjun Han

Sebastian Padó

CoGe

210

18 Mar 2024

A Neural Rewriting System to Solve Algorithmic Problems

244

27 Feb 2024

Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies

313

27 Feb 2024

Inducing Systematicity in Transformers by Attending to Structurally Quantized EmbeddingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Yichen Jiang

Xiang Zhou

Mohit Bansal

274

09 Feb 2024

Limits of Transformer Language Models on Learning to Compose Algorithms

Jonathan Thomm

Aleksandar Terzić

Giacomo Camposampiero

Michael Hersche

Bernhard Schölkopf

Abbas Rahimi

478

08 Feb 2024

On the generalization capacity of neural networks during generic multimodal reasoningInternational Conference on Learning Representations (ICLR), 2024

229

26 Jan 2024

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable TasksInternational Conference on Machine Learning (ICML), 2023

325

21 Nov 2023

Attribute Diversity Determines the Systematicity Gap in VQAConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Ian Berlot-Attwell

Kumar Krishna Agrawal

A. M. Carrell

Yash Sharma

Naomi Saphra

254

15 Nov 2023

Data Factors for Better Compositional Generalization

Xiang Zhou

Yichen Jiang

Mohit Bansal

CoGe OOD

190

08 Nov 2023

Syntax-Guided Transformers: Elevating Compositional Generalization and Grounding in Multimodal Environments

Danial Kamali

Parisa Kordjamshidi

210

07 Nov 2023

The Impact of Depth on Compositional Generalization in Transformer Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Jackson Petty

Sjoerd van Steenkiste

313

30 Oct 2023

SLOG: A Structural Generalization Benchmark for Semantic ParsingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

196

23 Oct 2023

Structural generalization in COGS: Supertagging is (almost) all you needConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

192

21 Oct 2023

Harnessing Dataset Cartography for Improved Compositional Generalization in TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Osman Batur .Ince

Tanin Zeraati

Semih Yagcioglu

Yadollah Yaghoobzadeh

Erkut Erdem

Aykut Erdem

164

18 Oct 2023

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Miguel Angel Bautista

...

331

13 Oct 2023

Sparse Universal TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Shawn Tan

Songlin Yang

Zhenfang Chen

Aaron Courville

Chuang Gan

MoE

260

11 Oct 2023

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

242

05 Oct 2023

Compositional Program Generation for Few-Shot Systematic Generalization

Luke Liu

215

28 Sep 2023

Efficient Benchmarking of Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Michal Shmueli-Scheuer

Leshem Choshen

ALM

515

22 Aug 2023

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program SynthesisInternational Conference on Learning Representations (ICLR), 2023

Yinlin Deng

184

26 Jul 2023

A Hybrid System for Systematic Generalization in Simple Arithmetic ProblemsInternational Workshop on Neural-Symbolic Learning and Reasoning (NeSy), 2023

186

29 Jun 2023

Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations

Pranava Madhyastha

Jingbo Zhu

159

24 Jun 2023

Differentiable Tree Operations Promote Compositional GeneralizationInternational Conference on Machine Learning (ICML), 2023

137

01 Jun 2023