ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.01128
  4. Cited By
Mnemosyne: Learning to Train Transformers with Transformers

Mnemosyne: Learning to Train Transformers with Transformers

2 February 2023
Deepali Jain
K. Choromanski
Kumar Avinava Dubey
Sumeet Singh
Vikas Sindhwani
Tingnan Zhang
Jie Tan
    OffRL
ArXivPDFHTML

Papers citing "Mnemosyne: Learning to Train Transformers with Transformers"

12 / 12 papers shown
Title
Make Optimization Once and for All with Fine-grained Guidance
Mingjia Shi
Ruihan Lin
Xuxi Chen
Yuhao Zhou
Zezhen Ding
...
Tong Wang
Kai Wang
Zhangyang Wang
J. Zhang
Tianlong Chen
53
1
0
14 Mar 2025
Dense Associative Memory Through the Lens of Random Features
Dense Associative Memory Through the Lens of Random Features
Benjamin Hoover
Duen Horng Chau
Hendrik Strobelt
Parikshit Ram
Dmitry Krotov
BDL
41
5
0
31 Oct 2024
Narrowing the Focus: Learned Optimizers for Pretrained Models
Narrowing the Focus: Learned Optimizers for Pretrained Models
Gus Kristiansen
Mark Sandler
A. Zhmoginov
Nolan Miller
Anirudh Goyal
Jihwan Lee
Max Vladymyrov
27
1
0
17 Aug 2024
Learning to Learn without Forgetting using Attention
Learning to Learn without Forgetting using Attention
Anna Vettoruzzo
Joaquin Vanschoren
Mohamed-Rafik Bouguelia
Thorsteinn Rögnvaldsson
CLL
37
2
0
06 Aug 2024
Learning a Fourier Transform for Linear Relative Positional Encodings in
  Transformers
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers
K. Choromanski
Shanda Li
Valerii Likhosherstov
Kumar Avinava Dubey
Shengjie Luo
Di He
Yiming Yang
Tamás Sarlós
Thomas Weingarten
Adrian Weller
17
8
0
03 Feb 2023
A Memory Transformer Network for Incremental Learning
A Memory Transformer Network for Incremental Learning
Ahmet Iscen
Thomas Bird
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLL
108
14
0
10 Oct 2022
Learning Model Predictive Controllers with Real-Time Attention for
  Real-World Navigation
Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation
Xuesu Xiao
Tingnan Zhang
K. Choromanski
Edward J. Lee
Anthony G. Francis
...
Leila Takayama
Roy Frostig
Jie Tan
Carolina Parada
Vikas Sindhwani
63
54
0
22 Sep 2022
On Learning the Transformer Kernel
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
39
14
0
15 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,835
0
18 Apr 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,808
0
14 Dec 2020
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
251
2,009
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
578
0
12 Mar 2020
1