ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2507.10524
  4. Cited By
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
v1v2v3 (latest)

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

14 July 2025
Sangmin Bae
Yujin Kim
Reza Bayat
S. Kim
Jiyoun Ha
Tal Schuster
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Aaron Courville
Se-Young Yun
    MoE
ArXiv (abs)PDFHTMLHuggingFace (55 upvotes)Github (490★)

Papers citing "Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation"

20 / 20 papers shown
Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers
Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers
H. Lin
Zhiqi Bai
X. Zhang
Sen Yang
Xiang Li
...
Yongchi Zhao
Jiamang Wang
Yuchi Xu
Wenbo Su
B. Zheng
132
0
0
03 Dec 2025
Mixture of States: Routing Token-Level Dynamics for Multimodal Generation
Mixture of States: Routing Token-Level Dynamics for Multimodal Generation
Haozhe Liu
Ding Liu
Mingchen Zhuge
Zijian Zhou
Tian Xie
...
Juan-Manuel Perez-Rua
Tao Xiang
Wei Liu
Shikun Liu
Jürgen Schmidhuber
105
0
0
15 Nov 2025
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Tianyu Fu
Yichen You
Z. Chen
Guohao Dai
Huazhong Yang
Yu Wang
LRM
189
1
0
11 Nov 2025
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Sean McLeish
Ang Li
John Kirchenbauer
Dayal Singh Kalra
Brian Bartoldson
B. Kailkhura
Avi Schwarzschild
Jonas Geiping
Tom Goldstein
Micah Goldblum
277
1
0
10 Nov 2025
Route Experts by Sequence, not by Token
Route Experts by Sequence, not by Token
Tiansheng Wen
Y. Wang
Aosong Feng
Long Ma
Xinyang Liu
Y. Wang
Lixuan Guo
Bo Chen
Stefanie Jegelka
Chenyu You
MoE
175
1
0
09 Nov 2025
From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural Operators
From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural Operators
Lei Liu
Zhongyi Yu
Hong Wang
Huanshuo Dong
Haiyang Xin
Hongwei Zhao
B. Li
156
0
0
27 Oct 2025
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning
Beomseok Kang
Jiwon Song
Jae-Joon Kim
LRM
141
0
0
16 Oct 2025
On the Reasoning Abilities of Masked Diffusion Language Models
On the Reasoning Abilities of Masked Diffusion Language Models
Anej Svete
Ashish Sabharwal
DiffMLRM
111
0
0
15 Oct 2025
Dr.LLM: Dynamic Layer Routing in LLMs
Dr.LLM: Dynamic Layer Routing in LLMs
Ahmed Heakl
Martin Gubri
Salman Khan
Sangdoo Yun
Seong Joon Oh
ReLM
342
1
1
14 Oct 2025
Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production
Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production
Alexandre Galashov
Matt Jones
Rosemary Ke
Yuan Cao
Vaishnavh Nagarajan
Michael C. Mozer
113
0
0
13 Oct 2025
DND: Boosting Large Language Models with Dynamic Nested Depth
DND: Boosting Large Language Models with Dynamic Nested Depth
Tieyuan Chen
Xiaodong Chen
Haoxing Chen
Zhenzhong Lan
W. Lin
Jianguo Li
MoE
230
0
0
13 Oct 2025
Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Qiaozhe Zhang
Jun Sun
Ruijie Zhang
Yingzhuang Liu
191
0
0
09 Oct 2025
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts
Yeskendir Koishekenov
Aldo Lipani
Nicola Cancedda
LRM
150
1
0
08 Oct 2025
MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
Umberto Cappellazzo
Minsu Kim
Pingchuan Ma
Honglie Chen
Xubo Liu
Stavros Petridis
Maja Pantic
MoE
155
0
0
05 Oct 2025
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
Cai Zhou
Chenxiao Yang
Yi Hu
Chenyu Wang
Chubin Zhang
Muhan Zhang
Lester Mackey
Tommi Jaakkola
Stephen Bates
Dinghuai Zhang
155
4
0
03 Oct 2025
Composer: A Search Framework for Hybrid Neural Architecture Design
Composer: A Search Framework for Hybrid Neural Architecture Design
Bilge Acun
Prasoon Sinha
Newsha Ardalani
Sangmin Bae
Alicia Golden
Chien-Yu Lin
Meghana Madhyastha
Fei Sun
N. Yadwadkar
Carole-Jean Wu
223
1
0
01 Oct 2025
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
Naibin Gu
Zhenyu Zhang
Yuchen Feng
Yilong Chen
Peng Fu
...
Shuohuan Wang
Yu Sun
Hua Wu
Weiping Wang
Haifeng Wang
MoE
87
0
0
26 Sep 2025
A Formal Comparison Between Chain-of-Thought and Latent Thought
A Formal Comparison Between Chain-of-Thought and Latent Thought
Kevin Xu
Issei Sato
ReLMLRM
111
1
0
25 Sep 2025
IMC-Net: A Lightweight Content-Conditioned Encoder with Multi-Pass Processing for Image Classification
IMC-Net: A Lightweight Content-Conditioned Encoder with Multi-Pass Processing for Image Classification
YiZhou Li
ViT
239
0
0
29 Jul 2025
Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Xinghao Chen
Anhao Zhao
Heming Xia
Xuan Lu
Hanlin Wang
Yanjun Chen
Wei Zhang
Jian Wang
W. Li
Xiaoyu Shen
ReLMLRM
383
18
0
22 May 2025
1