Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.10321
Cited By
DropDim: A Regularization Method for Transformer Networks
20 April 2023
Hao Zhang
Dan Qu
Kejia Shao
Xu Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DropDim: A Regularization Method for Transformer Networks"
6 / 6 papers shown
Title
AttentionDrop: A Novel Regularization Method for Transformer Models
Mirza Samad Ahmed Baig
Syeda Anshrah Gillani
Abdul Akbar Khan
Shahid Munir Shah
28
0
0
16 Apr 2025
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
80
1
0
21 Feb 2025
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
Hao Zhang
Nianwen Si
Yaqi Chen
Wenlin Zhang
Xukui Yang
Dan Qu
Weiqiang Zhang
25
9
0
20 Apr 2023
Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation
Hao Zhang
Nianwen Si
Yaqi Chen
Wenlin Zhang
Xukui Yang
Dan Qu
Zhen Li
16
3
0
20 Apr 2023
M3ST: Mix at Three Levels for Speech Translation
Xuxin Cheng
Qianqian Dong
Fengpeng Yue
Tom Ko
Mingxuan Wang
Yuexian Zou
13
40
0
07 Dec 2022
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Siddharth Dalmia
Brian Yan
Vikas Raunak
Florian Metze
Shinji Watanabe
27
30
0
02 May 2021
1