ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04884
  4. Cited By
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and
  Strong Baselines

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

8 June 2020
Marius Mosbach
Maksym Andriushchenko
Dietrich Klakow
ArXivPDFHTML

Papers citing "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines"

50 / 82 papers shown
Title
Decoding Reading Goals from Eye Movements
Decoding Reading Goals from Eye Movements
Omer Shubi
Cfir Avraham Hadar
Yevgeni Berzak
AIMat
44
1
0
28 Oct 2024
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
Adrian Chan
Anupam Mijar
Mehreen Saeed
Chau-Wai Wong
Akram Khater
36
0
0
03 Oct 2024
Efficient LLM Context Distillation
Efficient LLM Context Distillation
Rajesh Upadhayayaya
Zachary Smith
Chritopher Kottmyer
Manish Raj Osti
39
1
0
03 Sep 2024
Self-Training for Sample-Efficient Active Learning for Text
  Classification with Pre-Trained Language Models
Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models
Christopher Schröder
Gerhard Heyer
VLM
44
0
0
13 Jun 2024
Investigating the Robustness of Modelling Decisions for Few-Shot
  Cross-Topic Stance Detection: A Preregistered Study
Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study
Myrthe Reuver
Suzan Verberne
Antske Fokkens
34
1
0
05 Apr 2024
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for
  Optimized Learning Fusion
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion
Zijun Long
George Killick
Lipeng Zhuang
Gerardo Aragon Camarasa
Zaiqiao Meng
R. McCreadie
VLM
39
2
0
22 Feb 2024
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
Yupei Du
Albert Gatt
Dong Nguyen
24
1
0
10 Oct 2023
Prompt to be Consistent is Better than Self-Consistent? Few-Shot and
  Zero-Shot Fact Verification with Pre-trained Language Models
Prompt to be Consistent is Better than Self-Consistent? Few-Shot and Zero-Shot Fact Verification with Pre-trained Language Models
Fengzhu Zeng
Wei Gao
17
5
0
05 Jun 2023
Understanding Emotion Valence is a Joint Deep Learning Task
Understanding Emotion Valence is a Joint Deep Learning Task
Gabriel Roccabruna
Seyed Mahed Mousavi
Giuseppe Riccardi
21
0
0
27 May 2023
Toward Connecting Speech Acts and Search Actions in Conversational
  Search Tasks
Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks
Souvick Ghosh
Satanu Ghosh
C. Shah
25
2
0
08 May 2023
KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis
KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis
Antoine Nzeyimana
9
3
0
25 Apr 2023
On the Variance of Neural Network Training with respect to Test Sets and
  Distributions
On the Variance of Neural Network Training with respect to Test Sets and Distributions
Keller Jordan
OOD
16
10
0
04 Apr 2023
Sociocultural knowledge is needed for selection of shots in hate speech
  detection tasks
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks
Antonis Maronikolakis
Abdullatif Köksal
Hinrich Schütze
40
0
0
04 Apr 2023
Measuring the Instability of Fine-Tuning
Measuring the Instability of Fine-Tuning
Yupei Du
D. Nguyen
18
4
0
15 Feb 2023
Evaluating the Robustness of Discrete Prompts
Evaluating the Robustness of Discrete Prompts
Yoichi Ishibashi
Danushka Bollegala
Katsuhito Sudoh
Satoshi Nakamura
21
18
0
11 Feb 2023
Multi-Tenant Optimization For Few-Shot Task-Oriented FAQ Retrieval
Multi-Tenant Optimization For Few-Shot Task-Oriented FAQ Retrieval
Asha Vishwanathan
R. Warrier
G. V. Suresh
Chandrashekhar Kandpal
11
2
0
25 Jan 2023
A Stability Analysis of Fine-Tuning a Pre-Trained Model
A Stability Analysis of Fine-Tuning a Pre-Trained Model
Z. Fu
Anthony Man-Cho So
Nigel Collier
23
3
0
24 Jan 2023
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
Leonid Boytsov
Preksha Patel
Vivek Sourabh
Riddhi Nisar
Sayan Kundu
R. Ramanathan
Eric Nyberg
21
19
0
08 Jan 2023
Examining Political Rhetoric with Epistemic Stance Detection
Examining Political Rhetoric with Epistemic Stance Detection
Ankita Gupta
Su Lin Blodgett
Justin H. Gross
Brendan T. O'Connor
20
0
0
29 Dec 2022
KL Regularized Normalization Framework for Low Resource Tasks
KL Regularized Normalization Framework for Low Resource Tasks
Neeraj Kumar
Ankur Narang
Brejesh Lall
21
1
0
21 Dec 2022
Task-Specific Embeddings for Ante-Hoc Explainable Text Classification
Task-Specific Embeddings for Ante-Hoc Explainable Text Classification
Kishaloy Halder
Josip Krapac
A. Akbik
Anthony Brew
Matti Lyra
30
0
0
30 Nov 2022
BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model
  From Scratch?
BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch?
Joel Niklaus
Daniele Giofré
27
11
0
30 Nov 2022
Detecting Entities in the Astrophysics Literature: A Comparison of
  Word-based and Span-based Entity Recognition Methods
Detecting Entities in the Astrophysics Literature: A Comparison of Word-based and Span-based Entity Recognition Methods
Xiang Dai
Sarvnaz Karimi
13
3
0
24 Nov 2022
Probing neural language models for understanding of words of estimative
  probability
Probing neural language models for understanding of words of estimative probability
Damien Sileo
Marie-Francine Moens
19
10
0
07 Nov 2022
Gradient Knowledge Distillation for Pre-trained Language Models
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
23
5
0
02 Nov 2022
We need to talk about random seeds
We need to talk about random seeds
Steven Bethard
31
8
0
24 Oct 2022
Improving Stability of Fine-Tuning Pretrained Language Models via
  Component-Wise Gradient Norm Clipping
Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping
Chenghao Yang
Xuezhe Ma
32
6
0
19 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide
  Computation Reduction for Transfer Learning
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning
Shuo Xie
Jiahao Qiu
Ankita Pasad
Li Du
Qing Qu
Hongyuan Mei
32
16
0
18 Oct 2022
AD-DROP: Attribution-Driven Dropout for Robust Language Model
  Fine-Tuning
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
Tao Yang
Jinghao Deng
Xiaojun Quan
Qifan Wang
Shaoliang Nie
28
3
0
12 Oct 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Haw-Shiuan Chang
Ruei-Yao Sun
Kathryn Ricci
Andrew McCallum
43
14
0
10 Oct 2022
UU-Tax at SemEval-2022 Task 3: Improving the generalizability of
  language models for taxonomy classification through data augmentation
UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation
I. Sarhan
P. Mosteiro
Marco Spruit
29
2
0
07 Oct 2022
An Empirical Study on Cross-X Transfer for Legal Judgment Prediction
An Empirical Study on Cross-X Transfer for Legal Judgment Prediction
Joel Niklaus
Matthias Sturmer
Ilias Chalkidis
ELM
AILaw
32
18
0
25 Sep 2022
Drawing Causal Inferences About Performance Effects in NLP
Drawing Causal Inferences About Performance Effects in NLP
Sandra Wankmüller
CML
16
1
0
14 Sep 2022
Heuristic-free Optimization of Force-Controlled Robot Search Strategies
  in Stochastic Environments
Heuristic-free Optimization of Force-Controlled Robot Search Strategies in Stochastic Environments
Bastian Alt
Darko Katic
Rainer Jäkel
Michael Beetz
18
6
0
15 Jul 2022
Zero-shot Cross-lingual Transfer is Under-specified Optimization
Zero-shot Cross-lingual Transfer is Under-specified Optimization
Shijie Wu
Benjamin Van Durme
Mark Dredze
25
6
0
12 Jul 2022
Pretrained Models for Multilingual Federated Learning
Pretrained Models for Multilingual Federated Learning
Orion Weller
Marc Marone
Vladimir Braverman
Dawn J Lawrie
Benjamin Van Durme
VLM
FedML
AI4CE
33
42
0
06 Jun 2022
Can Foundation Models Help Us Achieve Perfect Secrecy?
Can Foundation Models Help Us Achieve Perfect Secrecy?
Simran Arora
Christopher Ré
FedML
16
6
0
27 May 2022
Linear Connectivity Reveals Generalization Strategies
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
234
45
0
24 May 2022
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures
  of Soft Prompts
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts
Akari Asai
Mohammadreza Salehi
Matthew E. Peters
Hannaneh Hajishirzi
124
100
0
24 May 2022
Calibration of Natural Language Understanding Models with Venn--ABERS
  Predictors
Calibration of Natural Language Understanding Models with Venn--ABERS Predictors
Patrizio Giovannotti
38
6
0
21 May 2022
Zero-shot Code-Mixed Offensive Span Identification through Rationale
  Extraction
Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction
Manikandan Ravikiran
Bharathi Raja Chakravarthi
20
3
0
12 May 2022
Few-shot Mining of Naturally Occurring Inputs and Outputs
Few-shot Mining of Naturally Occurring Inputs and Outputs
Mandar Joshi
Terra Blevins
M. Lewis
Daniel S. Weld
Luke Zettlemoyer
25
1
0
09 May 2022
A Comparison of Approaches for Imbalanced Classification Problems in the
  Context of Retrieving Relevant Documents for an Analysis
A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis
Sandra Wankmüller
23
2
0
03 May 2022
UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles
  for Detecting Patronizing and Condescending Language
UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles for Detecting Patronizing and Condescending Language
David Koleczek
Alexander Scarlatos
Siddha Makarand Karkare
Preshma Linet Pereira
19
0
0
18 Apr 2022
Reducing Model Jitter: Stable Re-training of Semantic Parsers in
  Production Environments
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments
Christopher Hidey
Fei Liu
Rahul Goel
21
4
0
10 Apr 2022
CoCoSoDa: Effective Contrastive Learning for Code Search
CoCoSoDa: Effective Contrastive Learning for Code Search
Ensheng Shi
Yanlin Wang
Wenchao Gu
Lun Du
Hongyu Zhang
Shi Han
Dongmei Zhang
Hongbin Sun
28
33
0
07 Apr 2022
PERFECT: Prompt-free and Efficient Few-shot Learning with Language
  Models
PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models
Rabeeh Karimi Mahabadi
Luke Zettlemoyer
James Henderson
Marzieh Saeidi
Lambert Mathias
Ves Stoyanov
Majid Yazdani
VLM
31
69
0
03 Apr 2022
Can Unsupervised Knowledge Transfer from Social Discussions Help
  Argument Mining?
Can Unsupervised Knowledge Transfer from Social Discussions Help Argument Mining?
Subhabrata Dutta
Jeevesh Juneja
Dipankar Das
Tanmoy Chakraborty
9
15
0
24 Mar 2022
Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
Guanzheng Chen
Fangyu Liu
Zaiqiao Meng
Shangsong Liang
26
88
0
16 Feb 2022
Transformer-based Models of Text Normalization for Speech Applications
Transformer-based Models of Text Normalization for Speech Applications
Jae Hun Ro
Felix Stahlberg
Ke Wu
Shankar Kumar
14
7
0
01 Feb 2022
12
Next