ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.15595
  4. Cited By
Rethinking Positional Encoding in Language Pre-training
v1v2v3v4 (latest)

Rethinking Positional Encoding in Language Pre-training

28 June 2020
Guolin Ke
Di He
Tie-Yan Liu
ArXiv (abs)PDFHTMLGithub (251★)

Papers citing "Rethinking Positional Encoding in Language Pre-training"

50 / 172 papers shown
Title
giMLPs: Gate with Inhibition Mechanism in MLPs
Cheng Kang
Jindich Prokop
Lei Tong
Huiyu Zhou
Yong Hu
Daneil Novak
33
0
0
01 Aug 2022
Generalized Attention Mechanism and Relative Position for Transformer
Generalized Attention Mechanism and Relative Position for Transformer
R. Pandya
ViT
18
1
0
24 Jul 2022
Parameterization of Cross-Token Relations with Relative Positional
  Encoding for Vision MLP
Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP
Zhicai Wang
Y. Hao
Xingyu Gao
Hao Zhang
Shuo Wang
Tingting Mu
Xiangnan He
70
8
0
15 Jul 2022
Towards Robust Referring Video Object Segmentation with Cyclic
  Relational Consensus
Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus
Xiang Li
Jinglu Wang
Xiaohao Xu
Xiao Li
Bhiksha Raj
Yan Lu
VOS
107
37
0
04 Jul 2022
LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning
  Tasks
LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks
Tuan Dinh
Yuchen Zeng
Ruisu Zhang
Ziqian Lin
Michael Gira
Shashank Rajput
Jy-yong Sohn
Dimitris Papailiopoulos
Kangwook Lee
LMTD
174
140
0
14 Jun 2022
MetaTPTrans: A Meta Learning Approach for Multilingual Code
  Representation Learning
MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning
Weiguo Pian
Hanyu Peng
Xunzhu Tang
Tiezhu Sun
Haoye Tian
Andrew Habib
Jacques Klein
Tegawende F. Bissyande
56
12
0
13 Jun 2022
Do we really need temporal convolutions in action segmentation?
Do we really need temporal convolutions in action segmentation?
Dazhao Du
Fuchun Sun
Yu Li
Zhongang Qi
Hui Xiong
Ying Shan
ViT
69
17
0
26 May 2022
Your Transformer May Not be as Powerful as You Expect
Your Transformer May Not be as Powerful as You Expect
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
139
54
0
26 May 2022
Neural Additive Models for Nowcasting
Neural Additive Models for Nowcasting
Wonkeun Jo
Dongil Kim
116
4
0
20 May 2022
KERPLE: Kernelized Relative Positional Embedding for Length
  Extrapolation
KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Peter J. Ramadge
Alexander I. Rudnicky
98
73
0
20 May 2022
Trading Positional Complexity vs. Deepness in Coordinate Networks
Trading Positional Complexity vs. Deepness in Coordinate Networks
Jianqiao Zheng
Sameera Ramasinghe
Xueqian Li
Simon Lucey
101
19
0
18 May 2022
Zero-shot Code-Mixed Offensive Span Identification through Rationale
  Extraction
Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction
Manikandan Ravikiran
Bharathi Raja Chakravarthi
100
3
0
12 May 2022
Decoupled Side Information Fusion for Sequential Recommendation
Decoupled Side Information Fusion for Sequential Recommendation
Yueqi Xie
Peilin Zhou
Sunghun Kim
102
114
0
23 Apr 2022
DecBERT: Enhancing the Language Understanding of BERT with Causal
  Attention Masks
DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks
Ziyang Luo
Yadong Xi
Jing Ma
Zhiwei Yang
Xiaoxi Mao
Changjie Fan
Rongsheng Zhang
42
3
0
19 Apr 2022
Dynamic Position Encoding for Transformers
Dynamic Position Encoding for Transformers
Joyce Zheng
Mehdi Rezagholizadeh
Peyman Passban
21
1
0
18 Apr 2022
3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of
  Transformer-MLP Paradigm for Dense Prediction in Medical Volume
3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume
Jianye Pang
Cheng Jiang
Yihao Chen
Jianbo Chang
M. Feng
Renzhi Wang
Jianhua Yao
ViTMedIm
55
11
0
14 Apr 2022
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding
  Language Models with Model Generated Signals
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Payal Bajaj
Chenyan Xiong
Guolin Ke
Xiaodong Liu
Di He
Saurabh Tiwary
Tie-Yan Liu
Paul N. Bennett
Xia Song
Jianfeng Gao
118
32
0
13 Apr 2022
Pretraining Text Encoders with Adversarial Mixture of Training Signal
  Generators
Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators
Yu Meng
Chenyan Xiong
Payal Bajaj
Saurabh Tiwary
Paul N. Bennett
Jiawei Han
Xia Song
MoE
78
16
0
07 Apr 2022
Visual Abductive Reasoning
Visual Abductive Reasoning
Chen Liang
Wenguan Wang
Tianfei Zhou
Yi Yang
LRM
92
40
0
26 Mar 2022
LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware
  Transformer Network
LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network
Zhigang Jiang
Zhongzheng Xiang
Jinhua Xu
Mingbi Zhao
ViT3DV
100
35
0
03 Mar 2022
FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks
FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks
Maksim Zubkov
Daniil Gavrilov
47
0
0
23 Feb 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
General-purpose, long-context autoregressive modeling with Perceiver AR
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
110
66
0
15 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLMObjD
176
884
0
07 Feb 2022
Improving Sample Efficiency of Value Based Models Using Attention and
  Vision Transformers
Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers
Amir Ardalan Kalantari
Mohammad Amini
Sarath Chandar
Doina Precup
83
4
0
01 Feb 2022
Rewiring with Positional Encodings for Graph Neural Networks
Rewiring with Positional Encodings for Graph Neural Networks
Rickard Brüel-Gabrielsson
Mikhail Yurochkin
Justin Solomon
AI4CE
117
33
0
29 Jan 2022
SoK: Vehicle Orientation Representations for Deep Rotation Estimation
SoK: Vehicle Orientation Representations for Deep Rotation Estimation
Huahong Tu
Siyuan Peng
Vladimir Leung
Richard Gao
3DPC
84
1
0
08 Dec 2021
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
Liting Lin
Heng Fan
Zhipeng Zhang
Yong-mei Xu
Haibin Ling
ViT
107
322
0
02 Dec 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
ViT
284
1,839
0
18 Nov 2021
Theme Transformer: Symbolic Music Generation with Theme-Conditioned
  Transformer
Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer
Yi-Jen Shih
Shih-Lun Wu
Frank Zalkow
Meinard Muller
Yi-Hsuan Yang
94
83
0
07 Nov 2021
Can Vision Transformers Perform Convolution?
Can Vision Transformers Perform Convolution?
Shanda Li
Xiangning Chen
Di He
Cho-Jui Hsieh
ViT
110
21
0
02 Nov 2021
Relative Molecule Self-Attention Transformer
Relative Molecule Self-Attention Transformer
Lukasz Maziarka
Dawid Majchrowski
Tomasz Danel
Piotr Gaiñski
Jacek Tabor
Igor T. Podolak
Pawel M. Morkisz
Stanislaw Jastrzebski
MedIm
92
36
0
12 Oct 2021
Learning to Iteratively Solve Routing Problems with Dual-Aspect
  Collaborative Transformer
Learning to Iteratively Solve Routing Problems with Dual-Aspect Collaborative Transformer
Yining Ma
Jingwen Li
Zhiguang Cao
Wen Song
Le Zhang
Zhenghua Chen
Jing Tang
168
140
0
06 Oct 2021
Multiplicative Position-aware Transformer Models for Language
  Understanding
Multiplicative Position-aware Transformer Models for Language Understanding
Zhiheng Huang
Davis Liang
Peng Xu
Bing Xiang
27
1
0
27 Sep 2021
The Impact of Positional Encodings on Multilingual Compression
The Impact of Positional Encodings on Multilingual Compression
Vinit Ravishankar
Anders Søgaard
60
5
0
11 Sep 2021
Ultra-high Resolution Image Segmentation via Locality-aware Context
  Fusion and Alternating Local Enhancement
Ultra-high Resolution Image Segmentation via Locality-aware Context Fusion and Alternating Local Enhancement
Wenxi Liu
Qi Li
Xin Lin
Weixiang Yang
Shengfeng He
Yuanlong Yu
76
7
0
06 Sep 2021
Teaching Autoregressive Language Models Complex Tasks By Demonstration
Teaching Autoregressive Language Models Complex Tasks By Demonstration
Gabriel Recchia
83
22
0
05 Sep 2021
Shatter: An Efficient Transformer Encoder with Single-Headed
  Self-Attention and Relative Sequence Partitioning
Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning
Ran Tian
Joshua Maynez
Ankur P. Parikh
ViT
56
2
0
30 Aug 2021
Conditional DETR for Fast Training Convergence
Conditional DETR for Fast Training Convergence
Depu Meng
Xiaokang Chen
Zejia Fan
Gang Zeng
Houqiang Li
Yuhui Yuan
Lei-huan Sun
Jingdong Wang
ViT
116
628
0
13 Aug 2021
Variable-Length Music Score Infilling via XLNet and Musically
  Specialized Positional Encoding
Variable-Length Music Score Infilling via XLNet and Musically Specialized Positional Encoding
Chin-Jui Chang
Chun-Yi Lee
Yi-Hsuan Yang
63
21
0
11 Aug 2021
PiSLTRc: Position-informed Sign Language Transformer with Content-aware
  Convolution
PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution
Pan Xie
Mengyi Zhao
Xiaohui Hu
ViTSLR
97
35
0
27 Jul 2021
SpectralFormer: Rethinking Hyperspectral Image Classification with
  Transformers
SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers
Danfeng Hong
Zhu Han
Jing Yao
Lianru Gao
Bing Zhang
Antonio J. Plaza
Jocelyn Chanussot
ViT
77
913
0
07 Jul 2021
Rethinking Positional Encoding
Rethinking Positional Encoding
Jianqiao Zheng
Sameera Ramasinghe
Simon Lucey
85
52
0
06 Jul 2021
Stable, Fast and Accurate: Kernelized Attention with Relative Positional
  Encoding
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
Shengjie Luo
Shanda Li
Tianle Cai
Di He
Dinglan Peng
Shuxin Zheng
Guolin Ke
Liwei Wang
Tie-Yan Liu
95
50
0
23 Jun 2021
Bootstrap Representation Learning for Segmentation on Medical Volumes
  and Sequences
Bootstrap Representation Learning for Segmentation on Medical Volumes and Sequences
Zejian Chen
Wei Zhuo
Tianfu Wang
Wufeng Xue
Dong Ni
101
6
0
23 Jun 2021
Large-Scale Chemical Language Representations Capture Molecular
  Structure and Properties
Large-Scale Chemical Language Representations Capture Molecular Structure and Properties
Jerret Ross
Brian M. Belgodere
Vijil Chenthamarakshan
Inkit Padhi
Youssef Mroueh
Payel Das
AI4CE
91
302
0
17 Jun 2021
Convolutions and Self-Attention: Re-interpreting Relative Positions in
  Pre-trained Language Models
Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models
Tyler A. Chang
Yifan Xu
Weijian Xu
Zhuowen Tu
ViT
57
15
0
10 Jun 2021
Do Transformers Really Perform Bad for Graph Representation?
Do Transformers Really Perform Bad for Graph Representation?
Chengxuan Ying
Tianle Cai
Shengjie Luo
Shuxin Zheng
Guolin Ke
Di He
Yanming Shen
Tie-Yan Liu
GNN
109
443
0
09 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
202
1,148
0
08 Jun 2021
CAPE: Encoding Relative Positions with Continuous Augmented Positional
  Embeddings
CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings
Tatiana Likhomanenko
Qiantong Xu
Gabriel Synnaeve
R. Collobert
A. Rogozhnikov
OODViT
84
60
0
06 Jun 2021
The Case for Translation-Invariant Self-Attention in Transformer-Based
  Language Models
The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
Ulme Wennberg
G. Henter
MILM
93
22
0
03 Jun 2021
Previous
1234
Next