ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.13019
  4. Cited By
Investigating the Limitations of Transformers with Simple Arithmetic
  Tasks
v1v2v3 (latest)

Investigating the Limitations of Transformers with Simple Arithmetic Tasks

25 February 2021
Rodrigo Nogueira
Zhiying Jiang
Jimmy J. Li
    LRM
ArXiv (abs)PDFHTMLGithub (38★)

Papers citing "Investigating the Limitations of Transformers with Simple Arithmetic Tasks"

50 / 106 papers shown
Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge
Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge
Yoshinari Fujinuma
ELM
139
0
0
21 Oct 2025
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
Eran Malach
Omid Saremi
Sinead Williamson
Arwen Bradley
Aryo Lotfi
Emmanuel Abbe
J. Susskind
Etai Littwin
211
1
0
16 Oct 2025
Efficient numeracy in language models through single-token number embeddings
Efficient numeracy in language models through single-token number embeddings
Linus Kreitner
Paul Hager
Jonathan Mengedoht
Georgios Kaissis
Daniel Rueckert
Martin Menten
149
3
0
08 Oct 2025
The Art of Breaking Words: Rethinking Multilingual Tokenizer Design
The Art of Breaking Words: Rethinking Multilingual Tokenizer Design
Aamod Thakur
Ajay Nagpal
Atharva Savarkar
Kundeshwar Pundalik
Siddhesh Dosi
Piyush Sawarkar
Viraj Thakur
Rohit Saluja
Maunendra Sankar Desarkar
Ganesh Ramakrishnan
240
2
0
03 Aug 2025
Long-Short Alignment for Effective Long-Context Modeling in LLMs
Long-Short Alignment for Effective Long-Context Modeling in LLMs
Tianqi Du
Haotian Huang
Yifei Wang
Yisen Wang
231
2
0
13 Jun 2025
Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer
Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer
Yihe Dong
Lorenzo Noci
Mikhail Khodak
Mufan Li
547
1
0
01 Jun 2025
Recursive Decomposition with Dependencies for Generic Divide-and-Conquer Reasoning
Recursive Decomposition with Dependencies for Generic Divide-and-Conquer Reasoning
Sergio Hernández-Gutiérrez
Minttu Alakuijala
Alexander Nikitin
Pekka Marttinen
LRM
319
3
0
05 May 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRLLRMAI4CE
388
15
0
22 Mar 2025
SuperBPE: Space Travel for Language Models
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
596
40
0
17 Mar 2025
Large Language Model as Meta-Surrogate for Data-Driven Many-Task Optimization: A Proof-of-Principle Study
Large Language Model as Meta-Surrogate for Data-Driven Many-Task Optimization: A Proof-of-Principle Study
Wei Wei
Yue-Jiao Gong
Jun Zhang
Ting Huang
Jun Zhang
368
1
0
11 Mar 2025
Simulating the Real World: A Unified Survey of Multimodal Generative Models
Simulating the Real World: A Unified Survey of Multimodal Generative Models
Yuqi Hu
Longguang Wang
Xian Liu
L. Chen
Yuwei Guo
Yukai Shi
Ce Liu
Anyi Rao
Zeyu Wang
Hui Xiong
VGenSyDa
255
10
0
06 Mar 2025
The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs
The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs
Tanja Baeumel
Josef van Genabith
Simon Ostermann
LRM
453
11
0
27 Feb 2025
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Ru Wang
Wei Huang
Selena Song
Haoyu Zhang
Yusuke Iwasawa
Y. Matsuo
Jiaxian Guo
Jiaxian Guo
LRMAI4CE
456
5
0
25 Feb 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Reasoning with Latent Thoughts: On the Power of Looped TransformersInternational Conference on Learning Representations (ICLR), 2025
Nikunj Saunshi
Nishanth Dikkala
Zhiyuan Li
Sanjiv Kumar
Sashank J. Reddi
OffRLLRMAI4CE
678
113
0
24 Feb 2025
Int2Int: a framework for mathematics with transformers
Int2Int: a framework for mathematics with transformers
François Charton
ViT
479
1
0
22 Feb 2025
Learning the symmetric group: large from small
Learning the symmetric group: large from small
Max Petschack
Alexandr Garbali
Jan de Gier
AAML
248
1
0
18 Feb 2025
Mathematical Language Models: A Survey
Mathematical Language Models: A Survey
Wen Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
689
26
0
03 Jan 2025
Quantifying artificial intelligence through algorithmic generalization
Quantifying artificial intelligence through algorithmic generalizationNature Machine Intelligence (Nat. Mach. Intell.), 2024
Takuya Ito
Murray Campbell
L. Horesh
Tim Klinger
Parikshit Ram
ELM
502
0
0
08 Nov 2024
PatternBoost: Constructions in Mathematics with a Little Help from AI
PatternBoost: Constructions in Mathematics with a Little Help from AI
François Charton
Jordan S. Ellenberg
Adam Zsolt Wagner
Geordie Williamson
226
33
0
01 Nov 2024
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Hao Sun
Liwei Wang
LRM
364
13
0
17 Oct 2024
Language Models Encode Numbers Using Digit Representations in Base 10
Language Models Encode Numbers Using Digit Representations in Base 10North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Amit Arnold Levy
Mor Geva
349
27
0
15 Oct 2024
Global Lyapunov functions: a long-standing open problem in mathematics,
  with symbolic transformers
Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformersNeural Information Processing Systems (NeurIPS), 2024
Alberto Alfarano
François Charton
Amaury Hayat
268
37
0
10 Oct 2024
MLissard: Multilingual Long and Simple Sequential Reasoning Benchmarks
MLissard: Multilingual Long and Simple Sequential Reasoning Benchmarks
M. Bueno
R. Lotufo
Rodrigo Nogueira
LRM
296
0
0
08 Oct 2024
RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory
  Waveform Estimation from PPG Signals
RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory Waveform Estimation from PPG Signals
Yuyang Miao
Zehua Chen
Chong Li
Danilo Mandic
DiffMMedIm
388
14
0
06 Oct 2024
Scaling Behavior for Large Language Models regarding Numeral Systems: An
  Example using Pythia
Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using PythiaConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zhejian Zhou
Jiayu Wang
Dahua Lin
Kai Chen
LRM
297
6
0
25 Sep 2024
Rule Extrapolation in Language Models: A Study of Compositional
  Generalization on OOD Prompts
Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts
Anna Mészáros
Szilvia Ujváry
Wieland Brendel
Patrik Reizinger
Ferenc Huszár
321
2
0
09 Sep 2024
Interpreting and Improving Large Language Models in Arithmetic
  Calculation
Interpreting and Improving Large Language Models in Arithmetic CalculationInternational Conference on Machine Learning (ICML), 2024
Wei Zhang
Chaoqun Wan
Yonggang Zhang
Yiu-ming Cheung
Xinmei Tian
Xu Shen
Jieping Ye
LRM
401
44
0
03 Sep 2024
Learning the Simplicity of Scattering Amplitudes
Learning the Simplicity of Scattering AmplitudesSciPost Physics (SciPost Phys.), 2024
Clifford Cheung
Aurélien Dersy
Matthew D. Schwartz
345
6
0
08 Aug 2024
Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks
Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks
Xingcheng Xu
Zibo Zhao
Haipeng Zhang
Yanqing Yang
LRM
320
0
0
25 Jul 2024
The Extrapolation Power of Implicit Models
The Extrapolation Power of Implicit Models
Juliette Decugis
Alicia Y. Tsai
Max Emerling
Ashwin Ganesh
L. Ghaoui
247
0
0
19 Jul 2024
Numbers Matter! Bringing Quantity-awareness to Retrieval Systems
Numbers Matter! Bringing Quantity-awareness to Retrieval Systems
Satya Almasian
Milena Bruseva
Michael Gertz
253
2
0
14 Jul 2024
Re-Tuning: Overcoming the Compositionality Limits of Large Language
  Models with Recursive Tuning
Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
Eric Pasewark
Kyle Montgomery
Kefei Duan
Dawn Song
Chenguang Wang
LRMCLLReLM
248
2
0
05 Jul 2024
Tools Fail: Detecting Silent Errors in Faulty Tools
Tools Fail: Detecting Silent Errors in Faulty Tools
Jimin Sun
So Yeon Min
Yingshan Chang
Yonatan Bisk
389
21
0
27 Jun 2024
Less can be more for predicting properties with large language models
Less can be more for predicting properties with large language models
Nawaf Alampara
Santiago Miret
Kevin Maik Jablonka
494
10
0
25 Jun 2024
Pre-trained Large Language Models Use Fourier Features to Compute
  Addition
Pre-trained Large Language Models Use Fourier Features to Compute Addition
Tianyi Zhou
Deqing Fu
Willie Neiswanger
Robin Jia
LRM
329
38
0
05 Jun 2024
Assessing the Emergent Symbolic Reasoning Abilities of Llama Large
  Language Models
Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
ReLMLRM
346
5
0
05 Jun 2024
Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy
  Arithmetic Tasks
Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks
Andrew Gambardella
Yusuke Iwasawa
Yutaka Matsuo
LRM
227
22
0
04 Jun 2024
Arbitrary-Length Generalization for Addition in a Tiny Transformer
Arbitrary-Length Generalization for Addition in a Tiny Transformer
A. G. Patriota
191
0
0
31 May 2024
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Jian-Qiao Zhu
Haijiang Yan
Thomas Griffiths
385
10
0
29 May 2024
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa
John Lafferty
361
8
0
26 May 2024
Models That Prove Their Own Correctness
Models That Prove Their Own Correctness
Noga Amit
S. Goldwasser
Orr Paradise
G. Rothblum
LRM
578
7
0
24 May 2024
Transforming the Bootstrap: Using Transformers to Compute Scattering
  Amplitudes in Planar N = 4 Super Yang-Mills Theory
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory
Tianji Cai
G. W. Merz
Franccois Charton
Niklas Nolte
Matthias Wilhelm
K. Cranmer
Lance J. Dixon
395
24
0
09 May 2024
Position: Understanding LLMs Requires More Than Statistical
  Generalization
Position: Understanding LLMs Requires More Than Statistical GeneralizationInternational Conference on Machine Learning (ICML), 2024
Patrik Reizinger
Szilvia Ujváry
Anna Mészáros
A. Kerekes
Wieland Brendel
Ferenc Huszár
411
24
0
03 May 2024
Evaluating Large Language Models on Time Series Feature Understanding: A
  Comprehensive Taxonomy and Benchmark
Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark
Elizabeth Fons
Rachneet Kaur
Soham Palande
Zhen Zeng
Svitlana Vyetrenko
T. Balch
AI4TS
303
33
0
25 Apr 2024
Laying Anchors: Semantically Priming Numerals in Language Modeling
Laying Anchors: Semantically Priming Numerals in Language Modeling
Mandar Sharma
Rutuja Murlidhar Taware
Pravesh Koirala
Nikhil Muralidhar
Naren Ramakrishnan
372
4
0
02 Apr 2024
A Neuro-Symbolic Approach to Monitoring Salt Content in Food
A Neuro-Symbolic Approach to Monitoring Salt Content in Food
Anuja Tayal
Barbara Di Eugenio
Devika Salunke
Andrew D. Boyd
Carolyn Dickens
Eulalia P Abril
Olga Garcia-Bedoya
Paula Allen-Meares
417
3
0
01 Apr 2024
A Theory for Length Generalization in Learning to Reason
A Theory for Length Generalization in Learning to Reason
Changnan Xiao
Bing Liu
LRM
396
12
0
31 Mar 2024
Laying the Foundation First? Investigating the Generalization from
  Atomic Skills to Complex Reasoning Tasks
Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks
Yuncheng Huang
Qi He
Yipei Xu
Jiaqing Liang
Yanghua Xiao
LRM
218
1
0
14 Mar 2024
tsGT: Stochastic Time Series Modeling With Transformer
tsGT: Stochastic Time Series Modeling With Transformer
Lukasz Kuciñski
Witold Drzewakowski
Mateusz Olko
Piotr Kozakowski
Lukasz Maziarka
Marta Emilia Nowakowska
Lukasz Kaiser
Piotr Milo's
314
4
0
08 Mar 2024
RORA: Robust Free-Text Rationale Evaluation
RORA: Robust Free-Text Rationale Evaluation
Zhengping Jiang
Yining Lu
Hanjie Chen
Daniel Khashabi
Benjamin Van Durme
Anqi Liu
317
7
0
28 Feb 2024
123
Next
Page 1 of 3