ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17852
  4. Cited By

Scaling Recurrent Neural Networks to a Billion Parameters with Zero-Order Optimization

23 May 2025
Francois Chaubard
Mykel J. Kochenderfer
    MQ
    AI4CE
ArXivPDFHTML

Papers citing "Scaling Recurrent Neural Networks to a Billion Parameters with Zero-Order Optimization"

17 / 17 papers shown
Title
Simultaneous Computation and Memory Efficient Zeroth-Order Optimizer for
  Fine-Tuning Large Language Models
Simultaneous Computation and Memory Efficient Zeroth-Order Optimizer for Fine-Tuning Large Language Models
Fei Wang
Li Shen
Liang Ding
Chao Xue
Ye Liu
Changxing Ding
64
1
0
13 Oct 2024
Demystify Mamba in Vision: A Linear Attention Perspective
Demystify Mamba in Vision: A Linear Attention Perspective
Dongchen Han
Ziyi Wang
Zhuofan Xia
Yizeng Han
Yifan Pu
Chunjiang Ge
Jun Song
Shiji Song
Bo Zheng
Gao Huang
Mamba
74
56
0
26 May 2024
Gradient-Free Training of Recurrent Neural Networks using Random
  Perturbations
Gradient-Free Training of Recurrent Neural Networks using Random Perturbations
Jesus Garcia Fernandez
Sander Keemink
Marcel van Gerven
AAML
61
5
0
14 May 2024
Guided-SPSA: Simultaneous Perturbation Stochastic Approximation assisted by the Parameter Shift Rule
Guided-SPSA: Simultaneous Perturbation Stochastic Approximation assisted by the Parameter Shift Rule
Maniraman Periyasamy
Axel Plinge
Christopher Mutschler
Daniel D. Scherer
Wolfgang Mauerer
63
12
0
24 Apr 2024
Linear attention is (maybe) all you need (to understand transformer
  optimization)
Linear attention is (maybe) all you need (to understand transformer optimization)
Kwangjun Ahn
Xiang Cheng
Minhak Song
Chulhee Yun
Ali Jadbabaie
S. Sra
59
50
1
02 Oct 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
813
12,840
0
27 Feb 2023
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
166
2,131
0
27 May 2022
OPT: Open Pre-trained Transformer Language Models
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
282
3,583
0
02 May 2022
Unbiased Gradient Estimation in Unrolled Computation Graphs with
  Persistent Evolution Strategies
Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
Paul Vicol
Luke Metz
Jascha Narain Sohl-Dickstein
66
69
0
27 Dec 2021
Efficient Transformers: A Survey
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
140
1,111
0
14 Sep 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
113
1,734
0
29 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
500
41,106
0
28 May 2020
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
Unbiased Online Recurrent Optimization
Unbiased Online Recurrent Optimization
Corentin Tallec
Yann Ollivier
62
97
0
16 Feb 2017
Decoupled Neural Interfaces using Synthetic Gradients
Decoupled Neural Interfaces using Synthetic Gradients
Max Jaderberg
Wojciech M. Czarnecki
Simon Osindero
Oriol Vinyals
Alex Graves
David Silver
Koray Kavukcuoglu
65
356
0
18 Aug 2016
Neural Turing Machines
Neural Turing Machines
Alex Graves
Greg Wayne
Ivo Danihelka
75
2,318
0
20 Oct 2014
Optimal rates for zero-order convex optimization: the power of two
  function evaluations
Optimal rates for zero-order convex optimization: the power of two function evaluations
John C. Duchi
Michael I. Jordan
Martin J. Wainwright
Andre Wibisono
57
480
0
07 Dec 2013
1