Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.02399
Cited By
Guiding Attention for Self-Supervised Learning with Transformers
6 October 2020
Ameet Deshpande
Karthik Narasimhan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Guiding Attention for Self-Supervised Learning with Transformers"
10 / 10 papers shown
Title
Enhancing Retrosynthesis with Conformer: A Template-Free Method
Jiaxi Zhuang
Qian Zhang
Ying Qian
160
0
0
21 Jan 2025
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi
Christopher D. Manning
Shikhar Murty
74
0
0
28 Nov 2024
Beyond Self-learned Attention: Mitigating Attention Bias in Transformer-based Models Using Attention Guidance
Jiri Gesi
Iftekhar Ahmed
57
0
0
26 Feb 2024
MUX-PLMs: Data Multiplexing for High-throughput Language Models
Vishvak Murahari
Ameet Deshpande
Carlos E. Jimenez
Izhak Shafran
Mingqiu Wang
Yuan Cao
Karthik Narasimhan
MoE
26
5
0
24 Feb 2023
Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms
Junghun Kim
Yoojin An
Jihie Kim
20
13
0
21 Aug 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Z. Ren
Huasheng Liang
Qiang Yan
Pengjie Ren
33
9
0
06 Apr 2022
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1