DeBERTa: Decoding-enhanced BERT with Disentangled Attention

5 June 2020

Xiaodong Liu

Papers citing "DeBERTa: Decoding-enhanced BERT with Disentangled Attention"

50 / 1,037 papers shown

Title
FUN with Fisher: Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing Chen Cecilia Liu Jonas Pfeiffer Ivan Vulić Iryna Gurevych CLL 17 9 0 13 Jan 2023
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers Leonid Boytsov Preksha Patel Vivek Sourabh Riddhi Nisar Sayan Kundu R. Ramanathan Eric Nyberg 19 19 0 08 Jan 2023
A Length-Extrapolatable Transformer Yutao Sun Li Dong Barun Patra Shuming Ma Shaohan Huang Alon Benhaim Vishrav Chaudhary Xia Song Furu Wei 24 115 0 20 Dec 2022
MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue Nikita Moghe E. Razumovskaia Liane Guillou Ivan Vulić Anna Korhonen Alexandra Birch 19 13 0 20 Dec 2022
Fine-Grained Distillation for Long Document Retrieval Yucheng Zhou Tao Shen Xiubo Geng Chongyang Tao Guodong Long Can Xu Daxin Jiang RALM 11 28 0 20 Dec 2022
BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics Liang Ma Shuyang Cao IV RobertL.Logan Di Lu Shihao Ran Kecheng Zhang Joel R. Tetreault A. Jaimes 13 6 0 20 Dec 2022
Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training Jing-ling Huang Zhengxuan Wu Kyle Mahowald Christopher Potts 19 13 0 19 Dec 2022
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor Or Honovich Thomas Scialom Omer Levy Timo Schick ALM 31 359 0 19 Dec 2022
Latent Diffusion for Language Generation Justin Lovelace Varsha Kishore Chao-gang Wan Eliot Shekhtman Kilian Q. Weinberger DiffM 19 71 0 19 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer Simiao Zuo Xiaodong Liu Jian Jiao Denis Xavier Charles Eren Manavoglu Tuo Zhao Jianfeng Gao 117 36 0 15 Dec 2022
Discovering Latent Knowledge in Language Models Without Supervision Collin Burns Haotian Ye Dan Klein Jacob Steinhardt 45 321 0 07 Dec 2022
Utilizing Background Knowledge for Robust Reasoning over Traffic Situations Jiarui Zhang Filip Ilievski Aravinda Kollaa Jonathan M Francis Kaixin Ma A. Oltramari 16 2 0 04 Dec 2022
IRRGN: An Implicit Relational Reasoning Graph Network for Multi-turn Response Selection Jingcheng Deng Hengwei Dai Xuewei Guo Yuanchen Ju Wei Peng LRM 17 2 0 01 Dec 2022
Transformers are Short Text Classifiers: A Study of Inductive Short Text Classifiers on Benchmarks and Real-world Datasets Fabian Karl A. Scherp VLM 11 20 0 30 Nov 2022
Understanding BLOOM: An empirical study on diverse NLP tasks Parag Dakle Sai Krishna Rallabandi Preethi Raghavan AI4CE 20 3 0 27 Nov 2022
The Naughtyformer: A Transformer Understands Offensive Humor Leonard Tang Alexander Cai Steve Li Jason Wang 6 3 0 25 Nov 2022
Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction Kai Shen Yichong Leng Xuejiao Tan Si-Qi Tang Yuan Zhang Wenjie Liu Ed Lin 22 13 0 23 Nov 2022
Hypergraph Transformer for Skeleton-based Action Recognition Yuxuan Zhou Zhi-Qi Cheng C. Li Yanwen Fang Yifeng Geng Xuansong Xie M. Keuper ViT 18 52 0 17 Nov 2022
A Universal Discriminator for Zero-Shot Generalization Haike Xu Zongyu Lin Jing Zhou Yanan Zheng Zhilin Yang AI4CE 13 14 0 15 Nov 2022
LERT: A Linguistically-motivated Pre-trained Language Model Yiming Cui Wanxiang Che Shijin Wang Ting Liu 13 24 0 10 Nov 2022
Word Order Matters when you Increase Masking Karim Lasri Alessandro Lenci Thierry Poibeau 28 7 0 08 Nov 2022
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries? Saadia Gabriel Hamid Palangi Yejin Choi AAML 24 1 0 08 Nov 2022
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications Juan Pablo Zuluaga Karel Veselý Igor Szöke Alexander Blatt P. Motlícek ... Claudia Cevenini Pavel Kolcárek Allan Tart J. Černocký Dietrich Klakow 22 23 0 08 Nov 2022
Using Deep Mixture-of-Experts to Detect Word Meaning Shift for TempoWiC Ze Chen Kangxu Wang Zijian Cai Jiewen Zheng Jiarong He Max Gao Jason Zhang MoE 14 3 0 07 Nov 2022
Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning Yu Meng Martin Michalski Jiaxin Huang Yu Zhang Tarek F. Abdelzaher Jiawei Han VLM 39 46 0 06 Nov 2022
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation Abhilasha Ravichander Matt Gardner Ana Marasović 19 33 0 01 Nov 2022
Solving Math Word Problems via Cooperative Reasoning induced Language Models Xinyu Zhu Junjie Wang Lin Zhang Yuxiang Zhang Ruyi Gan Jiaxing Zhang Yujiu Yang ReLM LRM 16 75 0 28 Oct 2022
Learning on Large-scale Text-attributed Graphs via Variational Inference Jianan Zhao Meng Qu Chaozhuo Li Hao Yan Qian Liu Rui Li Xing Xie Jian Tang VLM 8 131 0 26 Oct 2022
Uncertainty Sentence Sampling by Virtual Adversarial Perturbation Han Zhang Zhen Zhang Hongfei Jiang Yang Song 17 0 0 26 Oct 2022
Leveraging Affirmative Interpretations from Negation Improves Natural Language Understanding Md Mosharaf Hossain Eduardo Blanco 22 4 0 26 Oct 2022
ExPUNations: Augmenting Puns with Keywords and Explanations Jiao Sun Anjali Narayan-Chen Shereen Oraby Alessandra Cervone Tagyoung Chung Jing Huang Yang Liu Nanyun Peng 16 10 0 24 Oct 2022
The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative Leonie Weissweiler Valentin Hofmann Abdullatif Köksal Hinrich Schütze 27 31 0 24 Oct 2022
ComFact: A Benchmark for Linking Contextual Commonsense Knowledge Silin Gao Jena D. Hwang Saya Kanno Hiromi Wakaki Yuki Mitsufuji Antoine Bosselut HILM 30 16 0 23 Oct 2022
InforMask: Unsupervised Informative Masking for Language Model Pretraining Nafis Sadeq Canwen Xu Julian McAuley 17 13 0 21 Oct 2022
On Feature Learning in the Presence of Spurious Correlations Pavel Izmailov Polina Kirichenko Nate Gruver A. Wilson 21 116 0 20 Oct 2022
Late Prompt Tuning: A Late Prompt Could Be Better Than Many Prompts Xiangyang Liu Tianxiang Sun Xuanjing Huang Xipeng Qiu VLM 21 27 0 20 Oct 2022
Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective Adaku Uchendu Thai Le Dongwon Lee DeLMO 19 40 0 19 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning Shuo Xie Jiahao Qiu Ankita Pasad Li Du Qing Qu Hongyuan Mei 30 16 0 18 Oct 2022
ConReader: Exploring Implicit Relations in Contracts for Contract Clause Extraction Weiwen Xu Yang Deng Wenqiang Lei Wenlong Zhao Tat-Seng Chua W. Lam AILaw 18 6 0 17 Oct 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation Tanay Dixit Bhargavi Paranjape Hannaneh Hajishirzi Luke Zettlemoyer SyDa 135 23 0 10 Oct 2022
Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis Yuxin Xiao Paul Pu Liang Umang Bhatt W. Neiswanger Ruslan Salakhutdinov Louis-Philippe Morency 173 86 0 10 Oct 2022
Are Representations Built from the Ground Up? An Empirical Examination of Local Composition in Language Models Emmy Liu Graham Neubig CoGe 13 10 0 07 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding Jingye Chen Tengchao Lv Lei Cui Changrong Zhang Furu Wei 48 13 0 06 Oct 2022
polyBERT: A chemical language model to enable fully machine-driven ultrafast polymer informatics Christopher Kuenneth R. Ramprasad 26 100 0 29 Sep 2022
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding Erica K. Shimomoto Edison Marrese-Taylor Hiroya Takamura Ichiro Kobayashi Hideki Nakayama Yusuke Miyao 13 7 0 26 Sep 2022
Whodunit? Learning to Contrast for Authorship Attribution Bo Ai Yuchen Wang Yugin Tan Samson Tan SSL 8 15 0 23 Sep 2022
VIPHY: Probing "Visible" Physical Commonsense Knowledge Shikhar Singh Ehsan Qasemi Muhao Chen 29 6 0 15 Sep 2022
Testing Pre-trained Language Models' Understanding of Distributivity via Causal Mediation Analysis Pangbo Ban Yifan Jiang Tianran Liu Shane Steinert-Threlkeld 37 4 0 11 Sep 2022
5q032e@SMM4H'22: Transformer-based classification of premise in tweets related to COVID-19 Vadim Porvatov Natalia Semenova 27 2 0 08 Sep 2022
K-Order Graph-oriented Transformer with GraAttention for 3D Pose and Shape Estimation Weixi Zhao Weiqiang Wang ViT 3DPC 19 2 0 24 Aug 2022