Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.06567
Cited By
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
3 June 2024
Yilong Chen
Linhao Zhang
Junyuan Shang
Zhenyu Zhang
Tingwen Liu
Shuohuan Wang
Yu Sun
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion"
3 / 3 papers shown
Title
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
Siyu Ding
Junyuan Shang
Shuohuan Wang
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
60
52
0
31 Dec 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
2,009
0
28 Jul 2020
1