Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.06022
Cited By
Lessons on Parameter Sharing across Layers in Transformers
13 April 2021
Sho Takase
Shun Kiyono
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Lessons on Parameter Sharing across Layers in Transformers"
16 / 16 papers shown
Title
Adaptive Additive Parameter Updates of Vision Transformers for Few-Shot Continual Learning
Kyle Stein
A. Mahyari
Guillermo Francia III
Eman El-Sheikh
CLL
58
0
0
11 Apr 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
45
0
0
10 Jan 2025
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
68
5
0
28 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
37
3
0
02 Oct 2024
KALE-LM: Unleash The Power Of AI For Science Via Knowledge And Logic Enhanced Large Model
Weichen Dai
Yezeng Chen
Zijie Dai
Zhijie Huang
Y. Liu
...
Chengli Zhong
Xinhe Li
Zeyu Wang
Zhuoying Feng
Yi Zhou
33
0
0
27 Sep 2024
MALT: Multi-scale Action Learning Transformer for Online Action Detection
Zhipeng Yang
Ruoyu Wang
Yang Tan
Liping Xie
OffRL
38
1
0
31 May 2024
Enhancing Context Through Contrast
Kshitij Ambilduke
Aneesh Shetye
Diksha Bagade
Rishika Bhagwatkar
Khurshed Fitter
P. Vagdargi
Shital S. Chiddarwar
26
0
0
06 Jan 2024
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
22
1
0
07 Jun 2023
Semi-supervised Neural Machine Translation with Consistency Regularization for Low-Resource Languages
Viet H. Pham
Thang M. Pham
Giang Nguyen
Long H. B. Nguyen
D. Dinh
11
0
0
02 Apr 2023
Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction
Yue Yang
Artemis Panagopoulou
Marianna Apidianaki
Mark Yatskar
Chris Callison-Burch
21
2
0
24 Oct 2022
Spiking Neural Networks for event-based action recognition: A new task to understand their advantage
Alex Vicente-Sola
D. L. Manna
Paul Kirkland
G. D. Caterina
Trevor J. Bihl
19
8
0
29 Sep 2022
Streaming parallel transducer beam search with fast-slow cascaded encoders
Jay Mahadeokar
Yangyang Shi
Ke Li
Duc Le
Jiedan Zhu
Vikas Chandra
Ozlem Kalinli
M. Seltzer
17
15
0
29 Mar 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Tao Ge
Si-Qing Chen
Furu Wei
MoE
11
21
0
16 Feb 2022
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
17
44
0
20 Oct 2021
Is Attention always needed? A Case Study on Language Identification from Speech
A. Mandal
Santanu Pal
Indranil Dutta
Mahidas Bhattacharya
S. Naskar
11
6
0
05 Oct 2021
On Compositional Generalization of Neural Machine Translation
Yafu Li
Yongjing Yin
Yulong Chen
Yue Zhang
148
44
0
31 May 2021
1