Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.03274
Cited By
GMAT: Global Memory Augmentation for Transformers
5 June 2020
Ankit Gupta
Jonathan Berant
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"GMAT: Global Memory Augmentation for Transformers"
33 / 33 papers shown
Title
ARLED: Leveraging LED-based ARMAN Model for Abstractive Summarization of Persian Long Documents
Samira Zangooei
Amirhossein Darmani
Hossein Farahmand Nezhad
Laya Mahmoudi
86
0
0
13 Mar 2025
LM2: Large Memory Models
Jikun Kang
Wenqi Wu
Filippos Christianos
Alex J. Chan
Fraser Greenlee
George Thomas
Marvin Purtorab
Andy Toulis
KELM
191
0
0
09 Feb 2025
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Alexis Huet
Zied Ben-Houidi
Dario Rossi
LLMAG
78
2
0
21 Jan 2025
InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation
Zeyu Zhang
Akide Liu
Qi Chen
Feng Chen
Ian Reid
Richard Hartley
Bohan Zhuang
Hao Tang
Mamba
72
11
0
14 Jul 2024
From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models
Eleni Nisioti
Claire Glanois
Elias Najarro
Andrew Dai
Elliot Meyerson
J. Pedersen
Laetitia Teodorescu
Conor F. Hayes
Shyam Sudhakaran
Sebastian Risi
AI4CE
LM&Ro
103
4
0
14 Jun 2024
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
119
12
0
14 Apr 2024
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
136
86
0
23 Dec 2023
Uncertainty Guided Global Memory Improves Multi-Hop Question Answering
Alsu Sagirova
Andrey Kravchenko
RALM
97
1
0
29 Nov 2023
On the Long Range Abilities of Transformers
Itamar Zimerman
Lior Wolf
82
8
0
28 Nov 2023
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
116
8
0
19 Oct 2023
Heterogenous Memory Augmented Neural Networks
Zihan Qiu
Zhen Liu
Shuicheng Yan
Shanghang Zhang
Jie Fu
65
0
0
17 Oct 2023
Associative Transformer
Yuwei Sun
H. Ochiai
Zhirong Wu
Stephen Lin
Ryota Kanai
ViT
128
0
0
22 Sep 2023
Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie
Pengyu Cheng
Xiao Liang
Yong Dai
Nan Du
82
8
0
25 Aug 2023
Focus Your Attention (with Adaptive IIR Filters)
Shahar Lutati
Itamar Zimerman
Lior Wolf
101
10
0
24 May 2023
FIT: Far-reaching Interleaved Transformers
Ting-Li Chen
Lala Li
106
13
0
22 May 2023
Scaling Transformer to 1M tokens and beyond with RMT
Aydar Bulatov
Yuri Kuratov
Yermek Kapushev
Andrey Kravchenko
LRM
107
91
0
19 Apr 2023
Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis
Ta-Chung Chi
Ting-Han Fan
Alexander I. Rudnicky
Peter J. Ramadge
79
42
0
20 Dec 2022
Global memory transformer for processing long documents
Arij Al Adel
44
5
0
03 Dec 2022
SeDR: Segment Representation Learning for Long Documents Dense Retrieval
Junying Chen
Qingcai Chen
Dongfang Li
Yutao Huang
67
6
0
20 Nov 2022
Efficient Long-Text Understanding with Short-Text Models
Maor Ivgi
Uri Shaham
Jonathan Berant
VLM
126
84
0
01 Aug 2022
Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes
A. Sorokin
N. Buzun
Leonid Pugachev
Andrey Kravchenko
160
8
0
27 Jul 2022
Recurrent Memory Transformer
Aydar Bulatov
Yuri Kuratov
Andrey Kravchenko
CLL
49
112
0
14 Jul 2022
kMaX-DeepLab: k-means Mask Transformer
Qihang Yu
Huiyu Wang
Siyuan Qiao
Maxwell D. Collins
Yukun Zhu
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
ViT
167
19
0
08 Jul 2022
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
138
243
0
27 Jun 2022
CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
Qihang Yu
Huiyu Wang
Dahun Kim
Siyuan Qiao
Maxwell D. Collins
Yukun Zhu
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
ViT
MedIm
121
92
0
17 Jun 2022
ChapterBreak: A Challenge Dataset for Long-Range Language Models
Simeng Sun
Katherine Thai
Mohit Iyyer
51
20
0
22 Apr 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
131
316
0
27 Mar 2022
Interpretable Self-supervised Multi-task Learning for COVID-19 Information Retrieval and Extraction
Nima Ebadi
Peyman Najafirad
34
0
0
15 Jun 2021
Memory-efficient Transformers via Top-
k
k
k
Attention
Ankit Gupta
Guy Dar
Shaya Goodman
David Ciprut
Jonathan Berant
MQ
98
60
0
13 Jun 2021
Luna: Linear Unified Nested Attention
Xuezhe Ma
Xiang Kong
Sinong Wang
Chunting Zhou
Jonathan May
Hao Ma
Luke Zettlemoyer
89
113
0
03 Jun 2021
Value-aware Approximate Attention
Ankit Gupta
Jonathan Berant
69
6
0
17 Mar 2021
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Huiyu Wang
Yukun Zhu
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
ViT
147
531
0
01 Dec 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
210
4,109
0
10 Apr 2020
1