ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.05242
  4. Cited By
Large Memory Layers with Product Keys

Large Memory Layers with Product Keys

10 July 2019
Guillaume Lample
Alexandre Sablayrolles
MarcÁurelio Ranzato
Ludovic Denoyer
Hervé Jégou
    MoE
ArXivPDFHTML

Papers citing "Large Memory Layers with Product Keys"

31 / 31 papers shown
Title
Large Memory Network for Recommendation
Large Memory Network for Recommendation
Hui Lu
Zheng Chai
Y. Zheng
Zhe Chen
Deping Xie
Peng Xu
Xun Zhou
56
0
0
08 Feb 2025
An Evolved Universal Transformer Memory
An Evolved Universal Transformer Memory
Edoardo Cetin
Qi Sun
Tianyu Zhao
Yujin Tang
132
0
0
17 Oct 2024
Diversifying the Mixture-of-Experts Representation for Language Models
  with Orthogonal Optimizer
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
36
7
0
15 Oct 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
26
15
0
28 Sep 2023
Factorizers for Distributed Sparse Block Codes
Factorizers for Distributed Sparse Block Codes
Michael Hersche
Aleksandar Terzić
G. Karunaratne
Jovin Langenegger
Angeline Pouget
G. Cherubini
Luca Benini
Abu Sebastian
Abbas Rahimi
37
4
0
24 Mar 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
22
31
0
27 Jan 2023
Recurrent Memory Transformer
Recurrent Memory Transformer
Aydar Bulatov
Yuri Kuratov
Mikhail Burtsev
CLL
11
101
0
14 Jul 2022
NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of
  Compressed Video: Dataset, Methods and Results
NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results
Ren Yang
Radu Timofte
Mei Zheng
Qunliang Xing
Minglang Qiao
...
Yulin Huang
Junying Chen
I. Lee
Sunder Ali Khowaja
Jiseok Yoon
SupR
34
33
0
20 Apr 2022
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip
  Reading
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Minsu Kim
Jeong Hun Yeo
Yong Man Ro
13
61
0
04 Apr 2022
Linearizing Transformer with Key-Value Memory
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
20
5
0
23 Mar 2022
Memorizing Transformers
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
16
171
0
16 Mar 2022
Pruning Self-attentions into Convolutional Layers in Single Path
Pruning Self-attentions into Convolutional Layers in Single Path
Haoyu He
Jianfei Cai
Jing Liu
Zizheng Pan
Jing Zhang
Dacheng Tao
Bohan Zhuang
ViT
31
40
0
23 Nov 2021
Class Token and Knowledge Distillation for Multi-head Self-Attention
  Speaker Verification Systems
Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems
Victoria Mingote
A. Miguel
A. O. Giménez
EDUARDO LLEIDA SOLANO
25
10
0
06 Nov 2021
The Efficiency Misnomer
The Efficiency Misnomer
Daoyuan Chen
Liuyi Yao
Dawei Gao
Ashish Vaswani
Yaliang Li
32
98
0
25 Oct 2021
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin
  Information
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information
Zijun Sun
Xiaoya Li
Xiaofei Sun
Yuxian Meng
Xiang Ao
Qing He
Fei Wu
Jiwei Li
SSeg
46
183
0
30 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
37
813
0
14 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,086
0
08 Jun 2021
NWT: Towards natural audio-to-video generation with representation
  learning
NWT: Towards natural audio-to-video generation with representation learning
Rayhane Mama
Marc S. Tyndel
Hashiam Kadhim
Cole Clifford
Ragavan Thurairatnam
VGen
19
12
0
08 Jun 2021
Efficient Transformers in Reinforcement Learning using Actor-Learner
  Distillation
Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
Emilio Parisotto
Ruslan Salakhutdinov
37
43
0
04 Apr 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
11
2,070
0
11 Jan 2021
SMYRF: Efficient Attention using Asymmetric Clustering
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
23
44
0
11 Oct 2020
Learning Knowledge Bases with Parameters for Task-Oriented Dialogue
  Systems
Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems
Andrea Madotto
Samuel Cahyawijaya
Genta Indra Winata
Yan Xu
Zihan Liu
Zhaojiang Lin
Pascale Fung
34
59
0
28 Sep 2020
Efficient Transformers: A Survey
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
74
1,101
0
14 Sep 2020
SpotFast Networks with Memory Augmented Lateral Transformers for
  Lipreading
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading
Peratham Wiriyathammabhum
16
8
0
21 May 2020
Vector Quantized Contrastive Predictive Coding for Template-based Music
  Generation
Vector Quantized Contrastive Predictive Coding for Template-based Music Generation
Gaëtan Hadjeres
Léopold Crestel
23
18
0
21 Apr 2020
PIC: Permutation Invariant Convolution for Recognizing Long-range
  Activities
PIC: Permutation Invariant Convolution for Recognizing Long-range Activities
Noureldien Hussein
E. Gavves
A. Smeulders
VLM
18
13
0
18 Mar 2020
Memory-Based Graph Networks
Memory-Based Graph Networks
Amir Hosein Khas Ahmadi
Kaveh Hassani
Parsa Moradi
Leo Lee
Q. Morris
GNN
26
90
0
21 Feb 2020
REALM: Retrieval-Augmented Language Model Pre-Training
REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
13
1,987
0
10 Feb 2020
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
11
618
0
13 Nov 2019
CTRL: A Conditional Transformer Language Model for Controllable
  Generation
CTRL: A Conditional Transformer Language Model for Controllable Generation
N. Keskar
Bryan McCann
L. Varshney
Caiming Xiong
R. Socher
AI4CE
49
1,232
0
11 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
1