On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

8 June 2020

Papers citing "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines"

50 / 82 papers shown

Title
Decoding Reading Goals from Eye Movements Omer Shubi Cfir Avraham Hadar Yevgeni Berzak AIMat 44 1 0 28 Oct 2024
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers Adrian Chan Anupam Mijar Mehreen Saeed Chau-Wai Wong Akram Khater 36 0 0 03 Oct 2024
Efficient LLM Context Distillation Rajesh Upadhayayaya Zachary Smith Chritopher Kottmyer Manish Raj Osti 39 1 0 03 Sep 2024
Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models Christopher Schröder Gerhard Heyer VLM 44 0 0 13 Jun 2024
Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study Myrthe Reuver Suzan Verberne Antske Fokkens 34 1 0 05 Apr 2024
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion Zijun Long George Killick Lipeng Zhuang Gerardo Aragon Camarasa Zaiqiao Meng R. McCreadie VLM 39 2 0 22 Feb 2024
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics Yupei Du Albert Gatt Dong Nguyen 24 1 0 10 Oct 2023
Prompt to be Consistent is Better than Self-Consistent? Few-Shot and Zero-Shot Fact Verification with Pre-trained Language Models Fengzhu Zeng Wei Gao 17 5 0 05 Jun 2023
Understanding Emotion Valence is a Joint Deep Learning Task Gabriel Roccabruna Seyed Mahed Mousavi Giuseppe Riccardi 21 0 0 27 May 2023
Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks Souvick Ghosh Satanu Ghosh C. Shah 25 2 0 08 May 2023
KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis Antoine Nzeyimana 9 3 0 25 Apr 2023
On the Variance of Neural Network Training with respect to Test Sets and Distributions Keller Jordan OOD 16 10 0 04 Apr 2023
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks Antonis Maronikolakis Abdullatif Köksal Hinrich Schütze 40 0 0 04 Apr 2023
Measuring the Instability of Fine-Tuning Yupei Du D. Nguyen 18 4 0 15 Feb 2023
Evaluating the Robustness of Discrete Prompts Yoichi Ishibashi Danushka Bollegala Katsuhito Sudoh Satoshi Nakamura 21 18 0 11 Feb 2023
Multi-Tenant Optimization For Few-Shot Task-Oriented FAQ Retrieval Asha Vishwanathan R. Warrier G. V. Suresh Chandrashekhar Kandpal 11 2 0 25 Jan 2023
A Stability Analysis of Fine-Tuning a Pre-Trained Model Z. Fu Anthony Man-Cho So Nigel Collier 23 3 0 24 Jan 2023
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers Leonid Boytsov Preksha Patel Vivek Sourabh Riddhi Nisar Sayan Kundu R. Ramanathan Eric Nyberg 21 19 0 08 Jan 2023
Examining Political Rhetoric with Epistemic Stance Detection Ankita Gupta Su Lin Blodgett Justin H. Gross Brendan T. O'Connor 20 0 0 29 Dec 2022
KL Regularized Normalization Framework for Low Resource Tasks Neeraj Kumar Ankur Narang Brejesh Lall 21 1 0 21 Dec 2022
Task-Specific Embeddings for Ante-Hoc Explainable Text Classification Kishaloy Halder Josip Krapac A. Akbik Anthony Brew Matti Lyra 30 0 0 30 Nov 2022
BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch? Joel Niklaus Daniele Giofré 27 11 0 30 Nov 2022
Detecting Entities in the Astrophysics Literature: A Comparison of Word-based and Span-based Entity Recognition Methods Xiang Dai Sarvnaz Karimi 13 3 0 24 Nov 2022
Probing neural language models for understanding of words of estimative probability Damien Sileo Marie-Francine Moens 19 10 0 07 Nov 2022
Gradient Knowledge Distillation for Pre-trained Language Models Lean Wang Lei Li Xu Sun VLM 23 5 0 02 Nov 2022
We need to talk about random seeds Steven Bethard 31 8 0 24 Oct 2022
Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping Chenghao Yang Xuezhe Ma 32 6 0 19 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning Shuo Xie Jiahao Qiu Ankita Pasad Li Du Qing Qu Hongyuan Mei 32 16 0 18 Oct 2022
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning Tao Yang Jinghao Deng Xiaojun Quan Qifan Wang Shaoliang Nie 28 3 0 12 Oct 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling Haw-Shiuan Chang Ruei-Yao Sun Kathryn Ricci Andrew McCallum 43 14 0 10 Oct 2022
UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation I. Sarhan P. Mosteiro Marco Spruit 29 2 0 07 Oct 2022
An Empirical Study on Cross-X Transfer for Legal Judgment Prediction Joel Niklaus Matthias Sturmer Ilias Chalkidis ELM AILaw 32 18 0 25 Sep 2022
Drawing Causal Inferences About Performance Effects in NLP Sandra Wankmüller CML 16 1 0 14 Sep 2022
Heuristic-free Optimization of Force-Controlled Robot Search Strategies in Stochastic Environments Bastian Alt Darko Katic Rainer Jäkel Michael Beetz 18 6 0 15 Jul 2022
Zero-shot Cross-lingual Transfer is Under-specified Optimization Shijie Wu Benjamin Van Durme Mark Dredze 25 6 0 12 Jul 2022
Pretrained Models for Multilingual Federated Learning Orion Weller Marc Marone Vladimir Braverman Dawn J Lawrie Benjamin Van Durme VLM FedML AI4CE 33 42 0 06 Jun 2022
Can Foundation Models Help Us Achieve Perfect Secrecy? Simran Arora Christopher Ré FedML 16 6 0 27 May 2022
Linear Connectivity Reveals Generalization Strategies Jeevesh Juneja Rachit Bansal Kyunghyun Cho João Sedoc Naomi Saphra 234 45 0 24 May 2022
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts Akari Asai Mohammadreza Salehi Matthew E. Peters Hannaneh Hajishirzi 124 100 0 24 May 2022
Calibration of Natural Language Understanding Models with Venn--ABERS Predictors Patrizio Giovannotti 38 6 0 21 May 2022
Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction Manikandan Ravikiran Bharathi Raja Chakravarthi 20 3 0 12 May 2022
Few-shot Mining of Naturally Occurring Inputs and Outputs Mandar Joshi Terra Blevins M. Lewis Daniel S. Weld Luke Zettlemoyer 25 1 0 09 May 2022
A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis Sandra Wankmüller 23 2 0 03 May 2022
UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles for Detecting Patronizing and Condescending Language David Koleczek Alexander Scarlatos Siddha Makarand Karkare Preshma Linet Pereira 19 0 0 18 Apr 2022
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments Christopher Hidey Fei Liu Rahul Goel 21 4 0 10 Apr 2022
CoCoSoDa: Effective Contrastive Learning for Code Search Ensheng Shi Yanlin Wang Wenchao Gu Lun Du Hongyu Zhang Shi Han Dongmei Zhang Hongbin Sun 28 33 0 07 Apr 2022
PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models Rabeeh Karimi Mahabadi Luke Zettlemoyer James Henderson Marzieh Saeidi Lambert Mathias Ves Stoyanov Majid Yazdani VLM 31 69 0 03 Apr 2022
Can Unsupervised Knowledge Transfer from Social Discussions Help Argument Mining? Subhabrata Dutta Jeevesh Juneja Dipankar Das Tanmoy Chakraborty 9 15 0 24 Mar 2022
Revisiting Parameter-Efficient Tuning: Are We Really There Yet? Guanzheng Chen Fangyu Liu Zaiqiao Meng Shangsong Liang 26 88 0 16 Feb 2022
Transformer-based Models of Text Normalization for Speech Applications Jae Hun Ro Felix Stahlberg Ke Wu Shankar Kumar 14 7 0 01 Feb 2022