ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.09503
  4. Cited By
Out-of-distribution generalization via composition: a lens through induction heads in Transformers

Out-of-distribution generalization via composition: a lens through induction heads in Transformers

Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2024
31 December 2024
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
ArXiv (abs)PDFHTML

Papers citing "Out-of-distribution generalization via composition: a lens through induction heads in Transformers"

49 / 99 papers shown
Inference-Time Intervention: Eliciting Truthful Answers from a Language
  Model
Inference-Time Intervention: Eliciting Truthful Answers from a Language ModelNeural Information Processing Systems (NeurIPS), 2023
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELMHILM
746
833
0
06 Jun 2023
The Impact of Positional Encoding on Length Generalization in
  Transformers
The Impact of Positional Encoding on Length Generalization in TransformersNeural Information Processing Systems (NeurIPS), 2023
Amirhossein Kazemnejad
Inkit Padhi
Karthikeyan N. Ramamurthy
Payel Das
Siva Reddy
359
304
0
31 May 2023
Faith and Fate: Limits of Transformers on Compositionality
Faith and Fate: Limits of Transformers on CompositionalityNeural Information Processing Systems (NeurIPS), 2023
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLMLRM
525
497
0
29 May 2023
Testing the General Deductive Reasoning Capacity of Large Language
  Models Using OOD Examples
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD ExamplesNeural Information Processing Systems (NeurIPS), 2023
Abulhair Saparov
Richard Yuanzhe Pang
Vishakh Padmakumar
Nitish Joshi
Seyed Mehran Kazemi
Najoung Kim
He He
ELMLRM
388
124
0
24 May 2023
Large Language Models are In-Context Semantic Reasoners rather than
  Symbolic Reasoners
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners
Xiaojuan Tang
Zilong Zheng
Jiaqi Li
Fanxu Meng
Song-Chun Zhu
Yitao Liang
Muhan Zhang
ReLMLRM
300
76
0
24 May 2023
What In-Context Learning "Learns" In-Context: Disentangling Task
  Recognition and Task Learning
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jane Pan
Tianyu Gao
Howard Chen
Danqi Chen
201
161
0
16 May 2023
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of
  Large Language Models
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zhiqiang Hu
Lei Wang
Yihuai Lan
Wanyu Xu
Ee-Peng Lim
Lidong Bing
Xing Xu
Soujanya Poria
Roy Ka-wei Lee
ALM
312
382
0
04 Apr 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and
  Scaling
Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingInternational Conference on Machine Learning (ICML), 2023
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
391
1,629
0
03 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
4.6K
20,902
0
15 Mar 2023
A Theory of Emergent In-Context Learning as Implicit Structure Induction
A Theory of Emergent In-Context Learning as Implicit Structure Induction
Michael Hahn
Navin Goyal
LRM
257
101
0
14 Mar 2023
Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning
Multitask Prompt Tuning Enables Parameter-Efficient Transfer LearningInternational Conference on Learning Representations (ICLR), 2023
Zhen Wang
Yikang Shen
Leonid Karlinsky
Rogerio Feris
Huan Sun
Yoon Kim
VLMVPVLM
223
151
0
06 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
6.2K
17,759
0
27 Feb 2023
Do Deep Neural Networks Capture Compositionality in Arithmetic
  Reasoning?
Do Deep Neural Networks Capture Compositionality in Arithmetic Reasoning?Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Keito Kudo
Yoichi Aoki
Tatsuki Kuribayashi
Ana Brassard
Masashi Yoshikawa
Keisuke Sakaguchi
Kentaro Inui
CoGe
233
15
0
15 Feb 2023
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
Generalization on the Unseen, Logic Reasoning and Degree CurriculumInternational Conference on Machine Learning (ICML), 2023
Emmanuel Abbe
Samy Bengio
Aryo Lotfi
Kevin Rizk
LRM
402
62
0
30 Jan 2023
Progress measures for grokking via mechanistic interpretability
Progress measures for grokking via mechanistic interpretabilityInternational Conference on Learning Representations (ICLR), 2023
Neel Nanda
Lawrence Chan
Tom Lieberum
Jess Smith
Jacob Steinhardt
501
626
0
12 Jan 2023
Transformers learn in-context by gradient descent
Transformers learn in-context by gradient descentInternational Conference on Machine Learning (ICML), 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
493
642
0
15 Dec 2022
Editing Models with Task Arithmetic
Editing Models with Task ArithmeticInternational Conference on Learning Representations (ICLR), 2022
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELMMoMeMU
1.2K
740
0
08 Dec 2022
What learning algorithm is in-context learning? Investigations with
  linear models
What learning algorithm is in-context learning? Investigations with linear modelsInternational Conference on Learning Representations (ICLR), 2022
Ekin Akyürek
Dale Schuurmans
Jacob Andreas
Tengyu Ma
Denny Zhou
531
612
0
28 Nov 2022
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 smallInternational Conference on Learning Representations (ICLR), 2022
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
620
782
0
01 Nov 2022
Omnigrok: Grokking Beyond Algorithmic Data
Omnigrok: Grokking Beyond Algorithmic DataInternational Conference on Learning Representations (ICLR), 2022
Ziming Liu
Eric J. Michaud
Max Tegmark
364
111
0
03 Oct 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
607
700
0
24 Sep 2022
Toy Models of Superposition
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAMLMILM
1.4K
561
0
21 Sep 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function
  Classes
What Can Transformers Learn In-Context? A Case Study of Simple Function ClassesNeural Information Processing Systems (NeurIPS), 2022
Shivam Garg
Dimitris Tsipras
Abigail Z. Jacobs
Gregory Valiant
645
669
0
01 Aug 2022
Exploring Length Generalization in Large Language Models
Exploring Length Generalization in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Cem Anil
Yuhuai Wu
Anders Andreassen
Aitor Lewkowycz
Vedant Misra
V. Ramasesh
Ambrose Slone
Guy Gur-Ari
Ethan Dyer
Behnam Neyshabur
ReLMLRM
348
211
0
11 Jul 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Abigail Z. Jacobs
J. Dean
W. Fedus
ELMReLMLRM
532
3,128
0
15 Jun 2022
Unveiling Transformers with LEGO: a synthetic reasoning task
Unveiling Transformers with LEGO: a synthetic reasoning task
Yi Zhang
A. Backurs
Sébastien Bubeck
Ronen Eldan
Suriya Gunasekar
Tal Wagner
LRM
417
101
0
09 Jun 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot ReasonersNeural Information Processing Systems (NeurIPS), 2022
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLMLRM
1.4K
6,087
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
2.1K
17,638
0
04 Mar 2022
Rethinking the Role of Demonstrations: What Makes In-Context Learning
  Work?
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Sewon Min
Xinxi Lyu
Ari Holtzman
Mikel Artetxe
M. Lewis
Hannaneh Hajishirzi
Luke Zettlemoyer
LLMAGLRM
508
1,814
0
25 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.3K
14,608
0
28 Jan 2022
Grokking: Generalization Beyond Overfitting on Small Algorithmic
  Datasets
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Alethea Power
Yuri Burda
Harrison Edwards
Igor Babuschkin
Vedant Misra
354
487
0
06 Jan 2022
An Explanation of In-context Learning as Implicit Bayesian Inference
An Explanation of In-context Learning as Implicit Bayesian InferenceInternational Conference on Learning Representations (ICLR), 2021
Sang Michael Xie
Aditi Raghunathan
Abigail Z. Jacobs
Tengyu Ma
ReLMBDLVPVLMLRM
1.0K
928
0
03 Nov 2021
MetaICL: Learning to Learn In Context
MetaICL: Learning to Learn In ContextNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Sewon Min
M. Lewis
Luke Zettlemoyer
Hannaneh Hajishirzi
LRM
682
575
0
29 Oct 2021
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
1.1K
6,810
0
27 Oct 2021
Meta-learning via Language Model In-context Tuning
Meta-learning via Language Model In-context Tuning
Yanda Chen
Ruiqi Zhong
Sheng Zha
George Karypis
He He
554
193
0
15 Oct 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
832
3,962
0
20 Apr 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming
  Few-Shot Prompt Order Sensitivity
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order SensitivityAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILawLRM
948
1,383
0
18 Apr 2021
Surface Form Competition: Why the Highest Probability Answer Isn't
  Always Right
Surface Form Competition: Why the Highest Probability Answer Isn't Always RightConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Ari Holtzman
Peter West
Vered Schwartz
Yejin Choi
Luke Zettlemoyer
LRM
700
266
0
16 Apr 2021
Transformer visualization via dictionary learning: contextualized
  embedding as a linear superposition of transformer factors
Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factorsWorkshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out (DEELIO), 2021
Zeyu Yun
Yubei Chen
Bruno A. Olshausen
Yann LeCun
282
109
0
29 Mar 2021
Prompt Programming for Large Language Models: Beyond the Few-Shot
  Paradigm
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Laria Reynolds
Kyle McDonell
557
1,168
0
15 Feb 2021
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Are Key-Value MemoriesConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
620
1,150
0
29 Dec 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.0K
52,526
0
28 May 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
4.0K
27,917
0
26 Jul 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
2.9K
108,326
0
11 Oct 2018
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
4.2K
161,538
0
12 Jun 2017
Linear Algebraic Structure of Word Senses, with Applications to Polysemy
Linear Algebraic Structure of Word Senses, with Applications to Polysemy
Sanjeev Arora
Yuanzhi Li
Yingyu Liang
Tengyu Ma
Andrej Risteski
572
312
0
14 Jan 2016
Show and Tell: A Neural Image Caption Generator
Show and Tell: A Neural Image Caption GeneratorComputer Vision and Pattern Recognition (CVPR), 2014
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
606
6,373
0
17 Nov 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and TranslateInternational Conference on Learning Representations (ICLR), 2014
Dzmitry Bahdanau
Dong Wang
Yoshua Bengio
AIMat
1.6K
28,687
0
01 Sep 2014
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector SpaceInternational Conference on Learning Representations (ICLR), 2013
Tomas Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
1.4K
33,450
0
16 Jan 2013
Previous
12