ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.06963
  4. Cited By
The pitfalls of next-token prediction

The pitfalls of next-token prediction

11 March 2024
Gregor Bachmann
Vaishnavh Nagarajan
ArXivPDFHTML

Papers citing "The pitfalls of next-token prediction"

50 / 53 papers shown
Title
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
Dhruvesh Patel
Aishwarya Sahoo
Avinash Amballa
Tahira Naseem
Tim G. J. Rudner
Andrew McCallum
KELM
37
0
0
09 May 2025
Revisiting Data Auditing in Large Vision-Language Models
Revisiting Data Auditing in Large Vision-Language Models
Hongyu Zhu
Sichu Liang
W. Wang
Boheng Li
Tongxin Yuan
Fangqi Li
Shilin Wang
Zhuosheng Zhang
VLM
88
0
0
25 Apr 2025
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Vaishnavh Nagarajan
Chen Henry Wu
Charles Ding
Aditi Raghunathan
28
0
0
21 Apr 2025
Looking beyond the next token
Looking beyond the next token
Abitha Thankaraj
Yiding Jiang
J. Zico Kolter
Yonatan Bisk
LRM
46
1
0
15 Apr 2025
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition
Rishi Hazra
Gabriele Venturato
Pedro Zuidberg Dos Martires
Luc de Raedt
ReLM
LRM
58
0
0
04 Apr 2025
Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion
Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion
Chang Chen
Hany Hamed
Doojin Baek
Taegu Kang
Yoshua Bengio
Sungjin Ahn
47
0
0
25 Mar 2025
Efficient Joint Prediction of Multiple Future Tokens
Efficient Joint Prediction of Multiple Future Tokens
Kwangjun Ahn
Alex Lamb
John Langford
37
0
0
24 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
48
0
0
13 Mar 2025
Implicit Search via Discrete Diffusion: A Study on Chess
Implicit Search via Discrete Diffusion: A Study on Chess
Jiacheng Ye
Zhenyu Wu
Jiahui Gao
Zhiyong Wu
Xin Jiang
Z. Li
Lingpeng Kong
DiffM
43
2
0
27 Feb 2025
The Belief State Transformer
The Belief State Transformer
E. Hu
Kwangjun Ahn
Qinghua Liu
Haoran Xu
Manan Tomar
Ada Langford
Dinesh Jayaraman
Alex Lamb
John Langford
66
0
0
21 Feb 2025
Reasoning Bias of Next Token Prediction Training
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
80
1
0
21 Feb 2025
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models
Haoyang Li
Xuejia Chen
Zhanchao Xu
Darian Li
Nicole Hu
...
Y. Li
Luyu Qiu
C. Zhang
Qing Li
Lei Chen
LRM
ELM
34
1
0
16 Feb 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee
Ziyang Cai
Avi Schwarzschild
Kangwook Lee
Dimitris Papailiopoulos
ReLM
VLM
LRM
AI4CE
66
4
0
03 Feb 2025
Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End
  LLM Plan Generation
Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation
Sukai Huang
Trevor Cohn
N. Lipovetzky
LRM
73
1
0
14 Dec 2024
Do Large Language Models Perform Latent Multi-Hop Reasoning without
  Exploiting Shortcuts?
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
Sohee Yang
Nora Kassner
E. Gribovskaya
Sebastian Riedel
Mor Geva
KELM
LRM
ReLM
78
4
0
25 Nov 2024
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM
  Training in Proof Generation
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation
Chenyang An
Shima Imani
Feng Yao
Chengyu Dong
Ali Abbasi
...
Samuel Buss
Jingbo Shang
Gayathri Mahalingam
Pramod Sharma
Maurice Diesendruck
LRM
26
1
0
30 Oct 2024
Future Token Prediction -- Causal Language Modelling with Per-Token
  Semantic State Vector for Multi-Token Prediction
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction
Nicholas Walker
19
0
0
23 Oct 2024
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong
Shivam Agarwal
Yizhe Zhang
Jiacheng Ye
Lin Zheng
...
Peilin Zhao
W. Bi
Jiawei Han
Hao Peng
Lingpeng Kong
AI4CE
63
14
0
23 Oct 2024
Frontiers in Intelligent Colonoscopy
Frontiers in Intelligent Colonoscopy
Ge-Peng Ji
Jingyi Liu
Peng-Tao Xu
Nick Barnes
F. Khan
Salman Khan
Deng-Ping Fan
41
4
0
22 Oct 2024
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Zhepeng Cen
Yao Liu
Siliang Zeng
Pratik Chaudhar
Huzefa Rangwala
George Karypis
Rasool Fakoor
SyDa
AIFin
16
3
0
18 Oct 2024
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Jiacheng Ye
Jiahui Gao
Shansan Gong
Lin Zheng
Xin Jiang
Z. Li
Lingpeng Kong
DiffM
LRM
37
15
0
18 Oct 2024
The Mystery of the Pathological Path-star Task for Language Models
The Mystery of the Pathological Path-star Task for Language Models
Arvid Frydenlund
LRM
14
3
0
17 Oct 2024
SummAct: Uncovering User Intentions Through Interactive Behaviour
  Summarisation
SummAct: Uncovering User Intentions Through Interactive Behaviour Summarisation
Guanhua Zhang
Mohamed Ahmed
Zhiming Hu
Andreas Bulling
AI4TS
16
1
0
10 Oct 2024
Guided Stream of Search: Learning to Better Search with Language Models
  via Optimal Path Guidance
Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance
Seungyong Moon
Bumsoo Park
Hyun Oh Song
RALM
AIFin
16
1
0
03 Oct 2024
Semformer: Transformer Language Models with Semantic Planning
Semformer: Transformer Language Models with Semantic Planning
Yongjing Yin
Junran Ding
Kai Song
Yue Zhang
31
3
0
17 Sep 2024
Causal Language Modeling Can Elicit Search and Reasoning Capabilities on
  Logic Puzzles
Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles
Kulin Shah
Nishanth Dikkala
Xin Wang
Rina Panigrahy
ELM
ReLM
LRM
21
9
0
16 Sep 2024
Questioning Internal Knowledge Structure of Large Language Models
  Through the Lens of the Olympic Games
Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games
Juhwan Choi
Youngbin Kim
38
0
0
10 Sep 2024
Imitating Language via Scalable Inverse Reinforcement Learning
Imitating Language via Scalable Inverse Reinforcement Learning
Markus Wulfmeier
Michael Bloesch
Nino Vieillard
Arun Ahuja
Jorg Bornschein
...
Jost Tobias Springenberg
Nikola Momchev
Olivier Bachem
Matthieu Geist
Martin Riedmiller
27
9
0
02 Sep 2024
How Susceptible are LLMs to Influence in Prompts?
How Susceptible are LLMs to Influence in Prompts?
Sotiris Anagnostidis
Jannis Bulian
LRM
25
16
0
17 Aug 2024
Can Large Language Models Reason? A Characterization via 3-SAT
Can Large Language Models Reason? A Characterization via 3-SAT
Rishi Hazra
Gabriele Venturato
Pedro Zuidberg Dos Martires
Luc de Raedt
ELM
ReLM
LRM
19
4
0
13 Aug 2024
Mental Modeling of Reinforcement Learning Agents by Language Models
Mental Modeling of Reinforcement Learning Agents by Language Models
Wenhao Lu
Xufeng Zhao
Josua Spisak
Jae Hee Lee
Stefan Wermter
LLMAG
LRM
LM&Ro
22
2
0
26 Jun 2024
Make Some Noise: Unlocking Language Model Parallel Inference Capability
  through Noisy Training
Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training
Yixuan Wang
Xianzhen Luo
Fuxuan Wei
Yijun Liu
Qingfu Zhu
Xuanyu Zhang
Qing Yang
Dongliang Xu
Wanxiang Che
35
3
0
25 Jun 2024
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math
  Reasoning by Eight-Fold
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
Amrith Rajagopal Setlur
Saurabh Garg
Xinyang Geng
Naman Garg
Virginia Smith
Aviral Kumar
35
45
0
20 Jun 2024
What Are the Odds? Language Models Are Capable of Probabilistic
  Reasoning
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
Akshay Paruchuri
Jake Garrison
Shun Liao
John Hernandez
Jacob Sunshine
Tim Althoff
Xin Liu
Daniel J. McDuff
LRM
23
7
0
18 Jun 2024
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture
  Token Prediction
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction
Yinda Chen
Haoyuan Shi
Xiaoyu Liu
Te Shi
Ruobing Zhang
Dong Liu
Zhiwei Xiong
Feng Wu
36
9
0
27 May 2024
Transformers represent belief state geometry in their residual stream
Transformers represent belief state geometry in their residual stream
A. Shai
Sarah E. Marzen
Lucas Teixeira
Alexander Gietelink Oldenziel
P. Riechers
AI4CE
16
10
0
24 May 2024
Reinforcing Language Agents via Policy Optimization with Action
  Decomposition
Reinforcing Language Agents via Policy Optimization with Action Decomposition
Muning Wen
Ziyu Wan
Weinan Zhang
Jun Wang
Ying Wen
33
7
0
23 May 2024
Exploring the Compositional Deficiency of Large Language Models in
  Mathematical Reasoning
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning
Jun Zhao
Jingqi Tong
Yurong Mou
Ming Zhang
Qi Zhang
Xuanjing Huang
LRM
42
3
0
05 May 2024
State Space Model for New-Generation Network Alternative to
  Transformers: A Survey
State Space Model for New-Generation Network Alternative to Transformers: A Survey
Xiao Wang
Shiao Wang
Yuhe Ding
Yuehang Li
Wentao Wu
...
Bowei Jiang
Chenglong Li
Yaowei Wang
Yonghong Tian
Jin Tang
Mamba
33
48
0
15 Apr 2024
Stream of Search (SoS): Learning to Search in Language
Stream of Search (SoS): Learning to Search in Language
Kanishk Gandhi
Denise Lee
Gabriel Grand
Muxin Liu
Winson Cheng
Archit Sharma
Noah D. Goodman
RALM
AIFin
LRM
28
44
0
01 Apr 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
26
6
0
28 Feb 2024
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
48
14
0
23 Aug 2023
Distilling Step-by-Step! Outperforming Larger Language Models with Less
  Training Data and Smaller Model Sizes
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Lokesh Nagalapatti
Chun-Liang Li
Chih-Kuan Yeh
Hootan Nakhost
Yasuhisa Fujii
Alexander Ratner
Ranjay Krishna
Chen-Yu Lee
Tomas Pfister
ALM
204
498
0
03 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
153
186
0
02 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
203
2,232
0
22 Mar 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
223
2,413
0
06 Oct 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
On the Paradox of Learning to Reason from Data
On the Paradox of Learning to Reason from Data
Honghua Zhang
Liunian Harold Li
Tao Meng
Kai-Wei Chang
Guy Van den Broeck
NAI
ReLM
OOD
LRM
132
102
0
23 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
12
Next