ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.11062
  4. Cited By
Scaling Transformer to 1M tokens and beyond with RMT

Scaling Transformer to 1M tokens and beyond with RMT

19 April 2023
Aydar Bulatov
Yuri Kuratov
Yermek Kapushev
Mikhail Burtsev
    LRM
ArXivPDFHTML

Papers citing "Scaling Transformer to 1M tokens and beyond with RMT"

50 / 70 papers shown
Title
Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts
Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts
Yifei Yu
Qian Zhang
Lingfeng Qiao
Di Yin
Fang Li
Jie Wang
Z. Chen
Suncong Zheng
Xiaolong Liang
Xingchen Sun
39
0
0
07 Apr 2025
Reasoning on Multiple Needles In A Haystack
Reasoning on Multiple Needles In A Haystack
Yidong Wang
LRM
31
0
0
05 Apr 2025
U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack
Yunfan Gao
Yun Xiong
Wenlong Wu
Zijing Huang
Bohan Li
Haoyu Wang
54
3
0
01 Mar 2025
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai
Jianqiao Lu
Yao Luo
Yiyuan Ma
Xun Zhou
68
5
0
28 Feb 2025
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
Zhan Ling
Kang Liu
Kai Yan
Yuqing Yang
Weijian Lin
Ting-Han Fan
Lingfeng Shen
Zhengyin Du
Jiecao Chen
ReLM
ELM
LRM
49
3
0
25 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
137
2
0
20 Jan 2025
IntentGPT: Few-shot Intent Discovery with Large Language Models
IntentGPT: Few-shot Intent Discovery with Large Language Models
Juan A. Rodriguez
Nicholas Botzer
David Vazquez
Christopher Pal
M. Pedersoli
I. Laradji
VLM
68
3
0
16 Nov 2024
What is Wrong with Perplexity for Long-context Language Modeling?
What is Wrong with Perplexity for Long-context Language Modeling?
Lizhe Fang
Yifei Wang
Zhaoyang Liu
Chenheng Zhang
Stefanie Jegelka
Jinyang Gao
Bolin Ding
Yisen Wang
66
6
0
31 Oct 2024
Long Sequence Modeling with Attention Tensorization: From Sequence to
  Tensor Learning
Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning
Aosong Feng
Rex Ying
Leandros Tassiulas
27
2
0
28 Oct 2024
Taipan: Efficient and Expressive State Space Language Models with
  Selective Attention
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Chien Van Nguyen
Huy Huu Nguyen
Thang M. Pham
Ruiyi Zhang
Hanieh Deilamsalehy
...
Ryan A. Rossi
Trung Bui
Viet Dac Lai
Franck Dernoncourt
Thien Huu Nguyen
Mamba
RALM
29
1
0
24 Oct 2024
NetSafe: Exploring the Topological Safety of Multi-agent Networks
NetSafe: Exploring the Topological Safety of Multi-agent Networks
Miao Yu
Shilong Wang
Guibin Zhang
Junyuan Mao
Chenlong Yin
Qijiong Liu
Qingsong Wen
Kun Wang
Yang Wang
35
5
0
21 Oct 2024
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
Alireza Rezazadeh
Zichao Li
Wei Wei
Yujia Bao
37
4
0
17 Oct 2024
Forgetting Curve: A Reliable Method for Evaluating Memorization
  Capability for Long-context Models
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Xinyu Liu
Runsong Zhao
Pengcheng Huang
Chunyang Xiao
Bei Li
Jingang Wang
Tong Xiao
Jingbo Zhu
28
0
0
07 Oct 2024
LongGenBench: Long-context Generation Benchmark
LongGenBench: Long-context Generation Benchmark
Xiang Liu
Peijie Dong
Xuming Hu
Xiaowen Chu
RALM
45
8
0
05 Oct 2024
System 2 Reasoning Capabilities Are Nigh
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLM
LRM
46
0
0
04 Oct 2024
CSPS: A Communication-Efficient Sequence-Parallelism based Serving
  System for Transformer based Models with Long Prompts
CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts
Zeyu Zhang
Haiying Shen
VLM
29
0
0
23 Sep 2024
Towards LifeSpan Cognitive Systems
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELM
CLL
144
1
0
20 Sep 2024
Squid: Long Context as a New Modality for Energy-Efficient On-Device
  Language Models
Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Wei Chen
Zhiyuan Li
Shuo Xin
Yihao Wang
36
4
0
28 Aug 2024
ChatLogic: Integrating Logic Programming with Large Language Models for
  Multi-Step Reasoning
ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning
Zhongsheng Wang
Jiamou Liu
Qiming Bao
Hongfei Rong
Jingfeng Zhang
KELM
LRM
45
4
0
14 Jul 2024
InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long
  Motion Generation
InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation
Zeyu Zhang
Akide Liu
Qi Chen
Feng Chen
Ian Reid
Richard Hartley
Bohan Zhuang
Hao Tang
Mamba
31
9
0
14 Jul 2024
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Petr Anokhin
Nikita Semenov
Artyom Sorokin
Dmitry Evseev
Mikhail Burtsev
Evgeny Burnaev
Evgeny Burnaev
LLMAG
RALM
KELM
47
7
0
05 Jul 2024
Hidden Holes: topological aspects of language models
Hidden Holes: topological aspects of language models
Stephen Fitz
P. Romero
Jiyan Jonas Schneider
35
0
0
09 Jun 2024
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Namgyu Ho
Sangmin Bae
Taehyeon Kim
Hyunjik Jo
Yireun Kim
Tal Schuster
Adam Fisch
James Thorne
Se-Young Yun
45
8
0
04 Jun 2024
MVAD: A Multiple Visual Artifact Detector for Video Streaming
MVAD: A Multiple Visual Artifact Detector for Video Streaming
Chen Feng
Duolikun Danier
Fan Zhang
David Bull
25
0
0
31 May 2024
Toward Conversational Agents with Context and Time Sensitive Long-term
  Memory
Toward Conversational Agents with Context and Time Sensitive Long-term Memory
Nick Alonso
Tomás Figliolia
A. Ndirango
Beren Millidge
RALM
3DV
58
3
0
29 May 2024
On the Role of Attention Masks and LayerNorm in Transformers
On the Role of Attention Masks and LayerNorm in Transformers
Xinyi Wu
A. Ajorlou
Yifei Wang
Stefanie Jegelka
Ali Jadbabaie
43
9
0
29 May 2024
Unifying Demonstration Selection and Compression for In-Context Learning
Unifying Demonstration Selection and Compression for In-Context Learning
Jun Gao
Ziqiang Cao
Wenjie Li
38
3
0
27 May 2024
Compressing Lengthy Context With UltraGist
Compressing Lengthy Context With UltraGist
Peitian Zhang
Zheng Liu
Shitao Xiao
Ninglu Shao
Qiwei Ye
Zhicheng Dou
27
4
0
26 May 2024
Mixture of In-Context Prompters for Tabular PFNs
Mixture of In-Context Prompters for Tabular PFNs
Derek Xu
Olcay Cirit
Reza Asadi
Yizhou Sun
Wei Wang
31
9
0
25 May 2024
Incorporating Exponential Smoothing into MLP: A Simple but Effective
  Sequence Model
Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model
Jiqun Chu
Zuoquan Lin
AI4TS
30
2
0
26 Mar 2024
Bifurcated Attention: Accelerating Massively Parallel Decoding with
  Shared Prefixes in LLMs
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs
Ben Athiwaratkun
Sujan Kumar Gonugondla
Sanjay Krishna Gouda
Haifeng Qian
Hantian Ding
...
Liangfu Chen
Parminder Bhatia
Ramesh Nallapati
Sudipta Sengupta
Bing Xiang
56
4
0
13 Mar 2024
TaylorShift: Shifting the Complexity of Self-Attention from Squared to
  Linear (and Back) using Taylor-Softmax
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen
Sebastián M. Palacio
Andreas Dengel
51
3
0
05 Mar 2024
MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained
  Language Models
MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models
Nathanaël Carraz Rakotonirina
Marco Baroni
VLM
KELM
19
0
0
23 Feb 2024
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs
  Miss
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
119
33
0
16 Feb 2024
Flexibly Scaling Large Language Models Contexts Through Extensible
  Tokenization
Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization
Ninglu Shao
Shitao Xiao
Zheng Liu
Peitian Zhang
28
4
0
15 Jan 2024
Extending LLMs' Context Window with 100 Samples
Extending LLMs' Context Window with 100 Samples
Yikai Zhang
Junlong Li
Pengfei Liu
31
11
0
13 Jan 2024
Attendre: Wait To Attend By Retrieval With Evicted Queries in
  Memory-Based Transformers for Long Context Processing
Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing
Zi Yang
Nan Hua
RALM
34
4
0
10 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
16
12
0
07 Jan 2024
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved
  Pre-Training
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Alex Jinpeng Wang
Linjie Li
K. Lin
Jianfeng Wang
Kevin Lin
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
VLM
VGen
29
12
0
01 Jan 2024
Marathon: A Race Through the Realm of Long Context with Large Language
  Models
Marathon: A Race Through the Realm of Long Context with Large Language Models
Lei Zhang
Yunshui Li
Ziqiang Liu
Jiaxi Yang
Junhao Liu
Longze Chen
Run Luo
Min Yang
OffRL
LRM
45
5
0
15 Dec 2023
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long
  Documents
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
James Enouen
Hootan Nakhost
Sayna Ebrahimi
Sercan Ö. Arik
Yan Liu
Tomas Pfister
33
5
0
03 Dec 2023
Advancing Transformer Architecture in Long-Context Large Language
  Models: A Comprehensive Survey
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAG
KELM
36
54
0
21 Nov 2023
Think Before You Speak: Cultivating Communication Skills of Large
  Language Models via Inner Monologue
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue
Junkai Zhou
Liang Pang
Huawei Shen
Xueqi Cheng
LRM
22
4
0
13 Nov 2023
LooGLE: Can Long-Context Language Models Understand Long Contexts?
LooGLE: Can Long-Context Language Models Understand Long Contexts?
Jiaqi Li
Mengmeng Wang
Zilong Zheng
Muhan Zhang
ELM
RALM
32
107
0
08 Nov 2023
Breaking the Token Barrier: Chunking and Convolution for Efficient Long
  Text Classification with BERT
Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT
Aman Jaiswal
E. Milios
VLM
11
7
0
31 Oct 2023
General-Purpose Retrieval-Enhanced Medical Prediction Model Using
  Near-Infinite History
General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History
Junu Kim
Chaeeun Shim
Bosco Seong Kyu Yang
Chami Im
Sung Yoon Lim
Han-Gil Jeong
Edward Choi
28
8
0
31 Oct 2023
CLEX: Continuous Length Extrapolation for Large Language Models
CLEX: Continuous Length Extrapolation for Large Language Models
Guanzheng Chen
Xin Li
Zaiqiao Meng
Shangsong Liang
Li Bing
15
29
0
25 Oct 2023
Walking Down the Memory Maze: Beyond Context Limit through Interactive
  Reading
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Howard Chen
Ramakanth Pasunuru
Jason Weston
Asli Celikyilmaz
RALM
68
72
0
08 Oct 2023
LEGO-Prover: Neural Theorem Proving with Growing Libraries
LEGO-Prover: Neural Theorem Proving with Growing Libraries
Haiming Wang
Huajian Xin
Chuanyang Zheng
Lin Li
Zhengying Liu
...
Enze Xie
Jian Yin
Zhenguo Li
Heng Liao
Xiaodan Liang
LRM
39
63
0
01 Oct 2023
A Framework for Inference Inspired by Human Memory Mechanisms
A Framework for Inference Inspired by Human Memory Mechanisms
Xiangyu Zeng
Jie Lin
Piao Hu
Ruizheng Huang
Zhicheng Zhang
18
2
0
01 Oct 2023
12
Next