ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.13189
  4. Cited By
MoBA: Mixture of Block Attention for Long-Context LLMs

MoBA: Mixture of Block Attention for Long-Context LLMs

18 February 2025
Enzhe Lu
Z. L. Jiang
Qingbin Liu
Yulun Du
Tao Jiang
Chao Hong
Shixuan Liu
Weiran He
Enming Yuan
Yuzhi Wang
Zhiqi Huang
Huan Yuan
Suting Xu
Xinran Xu
Guokun Lai
Yanru Chen
Huabin Zheng
Junjie Yan
Jianlin Su
Yuxin Wu
N. Zhang
Zhilin Yang
Xinyu Zhou
Mingxing Zhang
J. Qiu
ArXiv (abs)PDFHTMLHuggingFace (17 upvotes)

Papers citing "MoBA: Mixture of Block Attention for Long-Context LLMs"

34 / 34 papers shown
Title
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
Chenlong Deng
Zhisong Zhang
Kelong Mao
Shuaiyi Li
Tianqing Fang
H. Zhang
Haitao Mi
Dong Yu
Zhicheng Dou
8
0
0
19 Sep 2025
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
Yuxuan Cai
Xiaozhuan Liang
X. Wang
Jin Ma
Haijin Liang
Jinwen Luo
Xinyu Zuo
Lisheng Duan
Yuyang Yin
Xi Chen
12
0
0
16 Sep 2025
CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling
CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling
Wenhao Li
Bangcheng Sun
Weihao Ye
Tianyi Zhang
Daohai Yu
Fei Chao
Rongrong Ji
8
0
0
11 Sep 2025
Bidirectional Sparse Attention for Faster Video Diffusion Training
Bidirectional Sparse Attention for Faster Video Diffusion Training
Chenlu Zhan
W. Li
Chuyu Shen
J. Zhang
Suhui Wu
H. Zhang
VGen
36
0
0
01 Sep 2025
Mixture of Contexts for Long Video Generation
Mixture of Contexts for Long Video Generation
S. Cai
Ceyuan Yang
Lvmin Zhang
Yuwei Guo
Junfei Xiao
...
Alan Yuille
Leonidas Guibas
Maneesh Agrawala
Lu Jiang
Gordon Wetzstein
VLM
40
2
0
28 Aug 2025
ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Xinhao Luo
Zihan Liu
Yangjie Zhou
Shihan Fang
Ziyu Huang
...
Chen Zhang
Shixuan Sun
Zhenzhe Zheng
Chen Chen
Minyi Guo
VLM
24
1
0
26 Aug 2025
Flash Sparse Attention: An Alternative Efficient Implementation of Native Sparse Attention Kernel
Flash Sparse Attention: An Alternative Efficient Implementation of Native Sparse Attention Kernel
Ran Yan
Youhe Jiang
Binhang Yuan
28
1
0
25 Aug 2025
Efficient Attention Mechanisms for Large Language Models: A Survey
Efficient Attention Mechanisms for Large Language Models: A Survey
Yutao Sun
Zhenyu Li
Yike Zhang
Tengyu Pan
Bowen Dong
Yuyi Guo
Jianyong Wang
86
3
0
25 Jul 2025
SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution
SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution
Liangbin Xie
Yu Li
Shian Du
Menghan Xia
Xintao Wang
Fanghua Yu
Ziyan Chen
Pengfei Wan
Jiantao Zhou
Chao Dong
DiffMVGenSupR
75
0
0
24 Jun 2025
PEVLM: Parallel Encoding for Vision-Language Models
PEVLM: Parallel Encoding for Vision-Language Models
Letian Kang
Shixian Luo
Yiqiang Li
Yuxin Yin
Shenxuan Zhou
Xiaoyang Yu
Jin Yang
Yong Wu
MLLMVLM
118
0
0
24 Jun 2025
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Yizhao Gao
Shuming Guo
Shijie Cao
Yuqing Xia
Yu Cheng
...
Hayden Kwok-Hay So
Yu Hua
Ting Cao
Fan Yang
Mao Yang
VLMLRM
95
4
0
10 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
130
7
0
09 Jun 2025
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Kun Zhang
Le Wu
Kui Yu
Guangyi Lv
Dacao Zhang
AAMLELM
70
0
0
08 Jun 2025
Rectified Sparse Attention
Rectified Sparse Attention
Yutao Sun
Tianzhu Ye
Li Dong
Yuqing Xia
Jian Chen
Yizhao Gao
S. Cao
Jianyong Wang
Furu Wei
192
2
0
04 Jun 2025
Comba: Improving Bilinear RNNs with Closed-loop Control
Comba: Improving Bilinear RNNs with Closed-loop Control
Jiaxi Hu
Yongqi Pan
Jusen Du
Disen Lan
Xiaqiang Tang
Qingsong Wen
Yuxuan Liang
Weigao Sun
252
0
0
03 Jun 2025
Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding
Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding
Feiyu Yao
Qian Wang
106
0
0
30 May 2025
SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling
SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling
Xiaodong Ji
Hailin Zhang
Fangcheng Fu
Bin Cui
110
0
0
30 May 2025
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Tianyu Fu
Yi Ge
Yichen You
Enshu Liu
Zhihang Yuan
Guohao Dai
Shengen Yan
Huazhong Yang
Yu Wang
MoELRM
146
4
0
27 May 2025
Understanding Transformer from the Perspective of Associative Memory
Understanding Transformer from the Perspective of Associative Memory
Shu Zhong
Mingyu Xu
Tenglong Ao
Guang Shi
106
4
0
26 May 2025
QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization
QwenLong-CPRS: Towards ∞\infty∞-LLMs with Dynamic Context Optimization
Weizhou Shen
Chenliang Li
Fanqi Wan
Shengyi Liao
Shaopeng Lai
...
Bin Yang
Ji Zhang
Fei Huang
Jingren Zhou
Ming Yan
133
1
0
23 May 2025
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
Wang Yang
Zirui Liu
Hongye Jin
Qingyu Yin
Vipin Chaudhary
Xiaotian Han
ReLMLRM
147
2
0
22 May 2025
Training-Free Efficient Video Generation via Dynamic Token Carving
Training-Free Efficient Video Generation via Dynamic Token Carving
Yuechen Zhang
Jinbo Xing
Bin Xia
Shaoteng Liu
Bohao Peng
Xin Tao
Pengfei Wan
Eric Lo
Jiaya Jia
DiffMVGen
146
3
0
22 May 2025
Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning
Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning
Yuheng Lu
ZiMeng Bai
Caixia Yuan
Huixing Jiang
Xiaojie Wang
LRM
147
0
0
17 May 2025
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
Yao Fu
Yao Fu
Yeqi Huang
Ping Nie
Zhan Lu
...
Dayou Du
Tairan Xu
Dayou Du
Edoardo Ponti
Luo Mai
MoE
169
1
0
16 May 2025
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Huashan Sun
Shengyi Liao
Yansen Han
Yu Bai
Yang Gao
...
Weizhou Shen
Fanqi Wan
Ming Yan
J.N. Zhang
Fei Huang
272
1
0
16 May 2025
WuNeng: Hybrid State with Attention
WuNeng: Hybrid State with Attention
Liu Xiao
Li Zhiyuan
Lin Yueyu
554
0
0
27 Apr 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Ziqiang Liu
Dong Li
E. Barsoum
260
0
0
23 Apr 2025
Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention
Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention
Xiang Hu
Jiaqi Leng
Jun Zhao
Kewei Tu
Wei Wu
Mamba
140
0
0
23 Apr 2025
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
Yucheng Li
Huiqiang Jiang
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Jianfeng Gao
Yue Yang
Lili Qiu
150
8
0
22 Apr 2025
Adaptive Computation Pruning for the Forgetting Transformer
Adaptive Computation Pruning for the Forgetting Transformer
Zhixuan Lin
J. Obando-Ceron
Xu Owen He
Rameswar Panda
143
2
0
09 Apr 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLMOffRLLRM
334
68
0
27 Mar 2025
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Yujiao Yang
Jing Lian
Linhui Li
MoE
201
0
0
04 Mar 2025
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Yifei Xia
Suhan Ling
Fangcheng Fu
Yijiao Wang
Huixia Li
Xuefeng Xiao
Tengjiao Wang
VGen
193
17
0
28 Feb 2025
Neural Attention Search
Neural Attention Search
Difan Deng
Marius Lindauer
185
0
0
18 Feb 2025
1