ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.15595
  4. Cited By
Rethinking Positional Encoding in Language Pre-training
v1v2v3v4 (latest)

Rethinking Positional Encoding in Language Pre-training

28 June 2020
Guolin Ke
Di He
Tie-Yan Liu
ArXiv (abs)PDFHTMLGithub (251★)

Papers citing "Rethinking Positional Encoding in Language Pre-training"

50 / 172 papers shown
Title
From Words and Exercises to Wellness: Farsi Chatbot for Self-Attachment
  Technique
From Words and Exercises to Wellness: Farsi Chatbot for Self-Attachment Technique
Sina Elahimanesh
Shayan Salehi
Sara Zahedi Movahed
Lisa Alazraki
Ruoyu Hu
Abbas Edalat
54
0
0
13 Oct 2023
Fast-ELECTRA for Efficient Pre-training
Fast-ELECTRA for Efficient Pre-training
Chengyu Dong
Liyuan Liu
Hao Cheng
Jingbo Shang
Jianfeng Gao
Xiaodong Liu
79
2
0
11 Oct 2023
Uncovering hidden geometry in Transformers via disentangling position
  and context
Uncovering hidden geometry in Transformers via disentangling position and context
Jiajun Song
Yiqiao Zhong
80
10
0
07 Oct 2023
Spherical Position Encoding for Transformers
Spherical Position Encoding for Transformers
Eren Unlu
48
0
0
04 Oct 2023
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit
  Quantization and Robustness
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness
Young Jin Kim
Raffy Fahim
Hany Awadalla
MQMoE
113
20
0
03 Oct 2023
GrowLength: Accelerating LLMs Pretraining by Progressively Growing
  Training Length
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Chia-Yuan Chang
Helen Zhou
77
11
0
01 Oct 2023
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language
  Models
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models
Chi Han
Qifan Wang
Hao Peng
Wenhan Xiong
Yu Chen
Heng Ji
Sinong Wang
155
61
0
30 Aug 2023
KinSPEAK: Improving speech recognition for Kinyarwanda via
  semi-supervised learning methods
KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
Antoine Nzeyimana
SSL
127
0
0
23 Aug 2023
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only
  Quantization for LLMs
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MQ
82
19
0
16 Aug 2023
Knowledge Distilled Ensemble Model for sEMG-based Silent Speech
  Interface
Knowledge Distilled Ensemble Model for sEMG-based Silent Speech Interface
Wenqiang Lai
Qihan Yang
Ye Mao
Endong Sun
Jiangnan Ye
RALM
51
0
0
07 Aug 2023
TransNormerLLM: A Faster and Better Large Language Model with Improved
  TransNormer
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
Zhen Qin
Dong Li
Weigao Sun
Weixuan Sun
Xuyang Shen
...
Yunshen Wei
Baohong Lv
Xiao Luo
Yu Qiao
Yiran Zhong
94
18
0
27 Jul 2023
Linearized Relative Positional Encoding
Linearized Relative Positional Encoding
Zhen Qin
Weixuan Sun
Kaiyue Lu
Huizhong Deng
Dong Li
Xiaodong Han
Yuchao Dai
Lingpeng Kong
Yiran Zhong
64
13
0
18 Jul 2023
Frameless Graph Knowledge Distillation
Frameless Graph Knowledge Distillation
Dai Shi
Zhiqi Shao
Yi Guo
Junbin Gao
74
4
0
13 Jul 2023
Multimodal Molecular Pretraining via Modality Blending
Multimodal Molecular Pretraining via Modality Blending
Qiying Yu
Yudi Zhang
Yuyan Ni
Shi Feng
Yanyan Lan
Hao Zhou
Jingjing Liu
72
13
0
12 Jul 2023
SPDER: Semiperiodic Damping-Enabled Object Representation
SPDER: Semiperiodic Damping-Enabled Object Representation
Kathan Shah
Chawin Sitawarin
57
2
0
27 Jun 2023
Large Language Models are Fixated by Red Herrings: Exploring Creative
  Problem Solving and Einstellung Effect using the Only Connect Wall Dataset
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset
S. Naeini
Raeid Saqur
M. Saeidi
John Giorgi
Babak Taati
119
11
0
19 Jun 2023
Relational Temporal Graph Reasoning for Dual-task Dialogue Language
  Understanding
Relational Temporal Graph Reasoning for Dual-task Dialogue Language Understanding
Bowen Xing
Ivor W. Tsang
70
15
0
15 Jun 2023
Monotonic Location Attention for Length Generalization
Monotonic Location Attention for Length Generalization
Jishnu Ray Chowdhury
Cornelia Caragea
LLMAG
56
8
0
31 May 2023
What and How does In-Context Learning Learn? Bayesian Model Averaging,
  Parameterization, and Generalization
What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization
Yufeng Zhang
Fengzhuo Zhang
Zhuoran Yang
Zhaoran Wang
BDL
104
74
0
30 May 2023
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained
  Transformer for Vision, Language, and Multimodal Tasks
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
Kai Zhang
Jun Yu
Eashan Adhikarla
Rong Zhou
Zhilin Yan
...
Xun Chen
Yong Chen
Quanzheng Li
Hongfang Liu
Lichao Sun
LM&MAMedIm
103
185
0
26 May 2023
Probing the Role of Positional Information in Vision-Language Models
Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch
Jindrich Libovický
63
8
0
17 May 2023
Musketeer: Joint Training for Multi-task Vision Language Model with Task
  Explanation Prompts
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts
Zhaoyang Zhang
Yantao Shen
Kunyu Shi
Zhaowei Cai
Jun Fang
Siqi Deng
Hao Yang
Davide Modolo
Zhuowen Tu
Stefano Soatto
VLM
83
2
0
11 May 2023
Pre-training Language Model as a Multi-perspective Course Learner
Pre-training Language Model as a Multi-perspective Course Learner
Beiduo Chen
Shaohan Huang
Zi-qiang Zhang
Wu Guo
Zhen-Hua Ling
Haizhen Huang
Furu Wei
Weiwei Deng
Qi Zhang
58
0
0
06 May 2023
Revisiting the Encoding of Satellite Image Time Series
Revisiting the Encoding of Satellite Image Time Series
Xin Cai
Y. Bi
Peter Nicholl
Roy Sterritt
AI4TS
83
5
0
03 May 2023
HST-MRF: Heterogeneous Swin Transformer with Multi-Receptive Field for
  Medical Image Segmentation
HST-MRF: Heterogeneous Swin Transformer with Multi-Receptive Field for Medical Image Segmentation
Xiaofei Huang
Hongfang Gong
Jin Zhang
MedIm
121
4
0
10 Apr 2023
EmotionIC: emotional inertia and contagion-driven dependency modeling
  for emotion recognition in conversation
EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation
Yingjian Liu
Jiang Li
Xiaoping Wang
Zhigang Zeng
107
15
0
20 Mar 2023
AutoMLP: Automated MLP for Sequential Recommendations
AutoMLP: Automated MLP for Sequential Recommendations
Muyang Li
Zijian Zhang
Xiangyu Zhao
Wanyu Wang
Minghao Zhao
Runze Wu
Ruocheng Guo
AI4TS
57
46
0
11 Mar 2023
Target-Aware Tracking with Long-term Context Attention
Target-Aware Tracking with Long-term Context Attention
Kaijie He
Canlong Zhang
Sheng Xie
Zhixin Li
Zhiwen Wang
66
55
0
27 Feb 2023
PolyFormer: Referring Image Segmentation as Sequential Polygon
  Generation
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Jiang Liu
Hui Ding
Zhaowei Cai
Yuting Zhang
R. Satzoda
Vijay Mahadevan
R. Manmatha
ObjD
123
133
0
14 Feb 2023
Enhancing Multivariate Time Series Classifiers through Self-Attention
  and Relative Positioning Infusion
Enhancing Multivariate Time Series Classifiers through Self-Attention and Relative Positioning Infusion
Mehryar Abbasi
Parvaneh Saeedi
AI4TS
84
6
0
13 Feb 2023
Knowledge Distillation in Vision Transformers: A Critical Review
Knowledge Distillation in Vision Transformers: A Critical Review
Gousia Habib
Tausifa Jan Saleem
Brejesh Lall
98
16
0
04 Feb 2023
Representation Deficiency in Masked Language Modeling
Representation Deficiency in Masked Language Modeling
Yu Meng
Jitin Krishnan
Sinong Wang
Qifan Wang
Yuning Mao
Han Fang
Marjan Ghazvininejad
Jiawei Han
Luke Zettlemoyer
147
7
0
04 Feb 2023
Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling
Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling
Ahmed Elnaggar
Hazem Essam
Wafaa Salah-Eldin
Walid Moustafa
Mohamed Elkerdawy
Charlotte Rochereau
B. Rost
238
102
0
16 Jan 2023
Cramming: Training a Language Model on a Single GPU in One Day
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
117
91
0
28 Dec 2022
P-Transformer: Towards Better Document-to-Document Neural Machine
  Translation
P-Transformer: Towards Better Document-to-Document Neural Machine Translation
Yachao Li
Junhui Li
Jing Jiang
Shimin Tao
Hao Yang
Hao Fei
ViT
64
10
0
12 Dec 2022
Mitigation of Spatial Nonstationarity with Vision Transformers
Mitigation of Spatial Nonstationarity with Vision Transformers
Lei Liu
Javier E. Santos
Mavsa Prodanović
Michael J. Pyrcz
50
4
0
09 Dec 2022
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating
  Unified Vision Language Model
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
Sheng Tang
Yaqing Wang
Zhenglun Kong
Tianchi Zhang
Yao Li
Caiwen Ding
Yanzhi Wang
Yi Liang
Dongkuan Xu
87
34
0
21 Nov 2022
Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud
  Scale Production
Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MoE
83
23
0
18 Nov 2022
Pure Transformer with Integrated Experts for Scene Text Recognition
Pure Transformer with Integrated Experts for Scene Text Recognition
Yew Lee Tan
A. Kong
Jung-jae Kim
ViT
102
18
0
09 Nov 2022
AutoAttention: Automatic Field Pair Selection for Attention in User
  Behavior Modeling
AutoAttention: Automatic Field Pair Selection for Attention in User Behavior Modeling
Zuowu Zheng
Xiaofeng Gao
Junwei Pan
Qicong Luo
Guihai Chen
Dapeng Liu
Jie Jiang
86
6
0
27 Oct 2022
The Curious Case of Absolute Position Embeddings
The Curious Case of Absolute Position Embeddings
Koustuv Sinha
Amirhossein Kazemnejad
Siva Reddy
J. Pineau
Dieuwke Hupkes
Adina Williams
135
15
0
23 Oct 2022
Transformers Learn Shortcuts to Automata
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRLLRM
161
178
0
19 Oct 2022
What Makes Convolutional Models Great on Long Sequence Modeling?
What Makes Convolutional Models Great on Long Sequence Modeling?
Yuhong Li
Tianle Cai
Yi Zhang
De-huai Chen
Debadeepta Dey
VLM
108
98
0
17 Oct 2022
Better Pre-Training by Reducing Representation Confusion
Better Pre-Training by Reducing Representation Confusion
Haojie Zhang
Mingfei Liang
Ruobing Xie
Zhen Sun
Bo Zhang
Leyu Lin
47
2
0
09 Oct 2022
Melody Infilling with User-Provided Structural Context
Melody Infilling with User-Provided Structural Context
Chih-Pin Tan
A. Su
Yi-Hsuan Yang
80
3
0
06 Oct 2022
A Dual-Attention Learning Network with Word and Sentence Embedding for
  Medical Visual Question Answering
A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering
Xiaofei Huang
Hongfang Gong
MedIm
106
14
0
01 Oct 2022
Husformer: A Multi-Modal Transformer for Multi-Modal Human State
  Recognition
Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition
Ruiqi Wang
Wonse Jo
Dezhong Zhao
Weizheng Wang
B. Yang
Guohua Chen
Byung-Cheol Min
HAI
95
31
0
30 Sep 2022
Mega: Moving Average Equipped Gated Attention
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
143
185
0
21 Sep 2022
Pre-Training a Graph Recurrent Network for Language Representation
Pre-Training a Graph Recurrent Network for Language Representation
Yile Wang
Linyi Yang
Zhiyang Teng
M. Zhou
Yue Zhang
GNN
81
1
0
08 Sep 2022
Learning Program Representations with a Tree-Structured Transformer
Learning Program Representations with a Tree-Structured Transformer
Wenhan Wang
Kechi Zhang
Ge Li
Shangqing Liu
Anran Li
Zhi Jin
Yang Liu
71
5
0
18 Aug 2022
Previous
1234
Next