Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.11972
Cited By
Do Transformer Modifications Transfer Across Implementations and Applications?
23 February 2021
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
Michael Matena
Karishma Malkan
Noah Fiedel
Noam M. Shazeer
Zhenzhong Lan
Yanqi Zhou
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Do Transformer Modifications Transfer Across Implementations and Applications?"
20 / 20 papers shown
Title
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Fahmida Liza Piya
Rahmatollah Beheshti
120
0
0
23 Apr 2025
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
15
41
0
12 Jul 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
92
2,306
0
09 Nov 2022
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost
Sungjun Cho
Seonwoo Min
Jinwoo Kim
Moontae Lee
Honglak Lee
Seunghoon Hong
30
3
0
27 Oct 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
32
100
0
21 Jul 2022
UL2: Unifying Language Learning Paradigms
Yi Tay
Mostafa Dehghani
Vinh Q. Tran
Xavier Garcia
Jason W. Wei
...
Tal Schuster
H. Zheng
Denny Zhou
N. Houlsby
Donald Metzler
AI4CE
57
294
0
10 May 2022
To Know by the Company Words Keep and What Else Lies in the Vicinity
Jake Williams
H. Heidenreich
16
0
0
30 Apr 2022
deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks
Dennis Ulmer
Christian Hardmeier
J. Frellsen
40
42
0
14 Apr 2022
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Payal Bajaj
Chenyan Xiong
Guolin Ke
Xiaodong Liu
Di He
Saurabh Tiwary
Tie-Yan Liu
Paul N. Bennett
Xia Song
Jianfeng Gao
42
32
0
13 Apr 2022
Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval
Jun Rao
Fei-Yue Wang
Liang Ding
Shuhan Qi
Yibing Zhan
Weifeng Liu
Dacheng Tao
OOD
29
28
0
08 Mar 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
71
222
0
21 Feb 2022
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
24
211
0
17 Feb 2022
Improving Neural Machine Translation by Denoising Training
Liang Ding
Keqin Peng
Dacheng Tao
VLM
AI4CE
25
6
0
19 Jan 2022
The Efficiency Misnomer
Daoyuan Chen
Liuyi Yao
Dawei Gao
Ashish Vaswani
Yaliang Li
32
98
0
25 Oct 2021
HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO
Katharina Eggensperger
Philip Muller
Neeratyoy Mallik
Matthias Feurer
René Sass
Aaron Klein
Noor H. Awad
Marius Lindauer
Frank Hutter
35
100
0
14 Sep 2021
SHAPE: Shifted Absolute Position Embedding for Transformers
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
Kentaro Inui
231
45
0
13 Sep 2021
Revisiting Deep Learning Models for Tabular Data
Yu. V. Gorishniy
Ivan Rubachev
Valentin Khrulkov
Artem Babenko
LMTD
19
696
0
22 Jun 2021
Do Transformers Really Perform Bad for Graph Representation?
Chengxuan Ying
Tianle Cai
Shengjie Luo
Shuxin Zheng
Guolin Ke
Di He
Yanming Shen
Tie-Yan Liu
GNN
23
433
0
09 Jun 2021
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
24
119
0
30 Nov 2020
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
93
142
0
24 Oct 2020
1