ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.10345
  4. Cited By
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation

Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation

Journal of Computational Science and Technology (JCST), 2020
24 February 2020
Yige Xu
Xipeng Qiu
L. Zhou
Xuanjing Huang
ArXiv (abs)PDFHTML

Papers citing "Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation"

30 / 30 papers shown
MaxPoolBERT: Enhancing BERT Classification via Layer- and Token-Wise Aggregation
MaxPoolBERT: Enhancing BERT Classification via Layer- and Token-Wise Aggregation
Maike Behrendt
Stefan Sylvius Wagner
Stefan Harmeling
SSeg
625
3
0
21 May 2025
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive
  Hashing
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive HashingNeural Information Processing Systems (NeurIPS), 2024
Xiaonan Nie
Qibin Liu
Fangcheng Fu
Shenhan Zhu
Xupeng Miao
Xiaochen Li
Yanzhe Zhang
Shouda Liu
Tengjiao Wang
MoE
252
4
0
13 Nov 2024
Over-parameterized Student Model via Tensor Decomposition Boosted
  Knowledge Distillation
Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge DistillationNeural Information Processing Systems (NeurIPS), 2024
Yu-Liang Zhan
Zhong-Yi Lu
Hao Sun
Ze-Feng Gao
306
2
0
10 Nov 2024
CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning
  for Noisy Label Learning
CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yeachan Kim
Junho Kim
SangKeun Lee
NoLaAAML
412
5
0
31 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of
  Prefix-tuning for Large Multi-modal Models
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
349
7
0
29 Oct 2024
SIKeD: Self-guided Iterative Knowledge Distillation for mathematical
  reasoning
SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Shivam Adarsh
Kumar Shridhar
Caglar Gulcehre
Nicholas Monath
Mrinmaya Sachan
LRM
242
6
0
24 Oct 2024
KPC-cF: Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering
KPC-cF: Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering
Kibeom Nam
468
0
0
29 Jun 2024
Large Language Models for Relevance Judgment in Product Search
Large Language Models for Relevance Judgment in Product Search
Navid Mehrdad
Hrushikesh Mohapatra
Mossaab Bagdouri
Prijith Chandran
Alessandro Magnani
...
Ajit Puthenputhussery
Sachin Yadav
Tony Lee
Chengxiang Zhai
Ciya Liao
266
11
0
01 Jun 2024
SurreyAI 2023 Submission for the Quality Estimation Shared Task
SurreyAI 2023 Submission for the Quality Estimation Shared TaskConference on Machine Translation (WMT), 2023
Archchana Sindhujan
Helen Treharne
Constantin Orasan
Tharindu Ranasinghe
236
4
0
01 Dec 2023
Speculative Decoding with Big Little Decoder
Speculative Decoding with Big Little DecoderNeural Information Processing Systems (NeurIPS), 2023
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
594
176
0
15 Feb 2023
Knowledge Distillation for Federated Learning: a Practical Guide
Knowledge Distillation for Federated Learning: a Practical GuideInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Alessio Mora
Irene Tenison
Paolo Bellavista
Irina Rish
FedML
252
52
0
09 Nov 2022
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation
Cody Blakeney
Jessica Zosa Forde
Jonathan Frankle
Ziliang Zong
Matthew L. Leavitt
VLM
316
4
0
01 Nov 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Multi-CLS BERT: An Efficient Alternative to Traditional EnsemblingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Haw-Shiuan Chang
Ruei-Yao Sun
Kathryn Ricci
Andrew McCallum
400
21
0
10 Oct 2022
SEMI-FND: Stacked Ensemble Based Multimodal Inference For Faster Fake
  News Detection
SEMI-FND: Stacked Ensemble Based Multimodal Inference For Faster Fake News DetectionExpert systems with applications (ESWA), 2022
Prabhav Singh
Ridam Srivastava
K. Rana
Vineet Kumar
379
49
0
17 May 2022
Unified Implicit Neural Stylization
Unified Implicit Neural StylizationEuropean Conference on Computer Vision (ECCV), 2022
Zhiwen Fan
Lezhi Li
Peihao Wang
Xinyu Gong
Dejia Xu
Zinan Lin
549
82
0
05 Apr 2022
Unified and Effective Ensemble Knowledge Distillation
Unified and Effective Ensemble Knowledge Distillation
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
FedML
180
13
0
01 Apr 2022
Cluster & Tune: Boost Cold Start Performance in Text Classification
Cluster & Tune: Boost Cold Start Performance in Text ClassificationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Eyal Shnarch
Ariel Gera
Alon Halfon
Lena Dankin
Leshem Choshen
R. Aharonov
Noam Slonim
250
25
0
20 Mar 2022
BiBERT: Accurate Fully Binarized BERT
BiBERT: Accurate Fully Binarized BERTInternational Conference on Learning Representations (ICLR), 2022
Haotong Qin
Yifu Ding
Mingyuan Zhang
Qing Yan
Aishan Liu
Qingqing Dang
Ziwei Liu
Xianglong Liu
MQ
315
120
0
12 Mar 2022
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an
  Application to Question Answering Systems
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yoshitomo Matsubara
Luca Soldaini
Eric Lind
Alessandro Moschitti
277
7
0
15 Jan 2022
How Emotionally Stable is ALBERT? Testing Robustness with Stochastic
  Weight Averaging on a Sentiment Analysis Task
How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
MoMe
235
6
0
18 Nov 2021
Alternative Input Signals Ease Transfer in Multilingual Machine
  Translation
Alternative Input Signals Ease Transfer in Multilingual Machine Translation
Simeng Sun
Angela Fan
James Cross
Vishrav Chaudhary
C. Tran
Philipp Koehn
Francisco Guzman
165
17
0
15 Oct 2021
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained
  Language Models
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models
Qianchu Liu
Fangyu Liu
Nigel Collier
Anna Korhonen
Ivan Vulić
335
24
0
19 Sep 2021
Neighborhood Consensus Contrastive Learning for Backward-Compatible
  Representation
Neighborhood Consensus Contrastive Learning for Backward-Compatible RepresentationAAAI Conference on Artificial Intelligence (AAAI), 2021
Shengsen Wu
Liang Chen
Yihang Lou
Yan Bai
Tao Bai
Minghua Deng
Ling-yu Duan
407
8
0
07 Aug 2021
Linking Common Vulnerabilities and Exposures to the MITRE ATT&CK
  Framework: A Self-Distillation Approach
Linking Common Vulnerabilities and Exposures to the MITRE ATT&CK Framework: A Self-Distillation Approach
Benjamin Ampel
Sagar Samtani
Steven Ullman
Hsinchun Chen
256
52
0
03 Aug 2021
Local-Global Knowledge Distillation in Heterogeneous Federated Learning
  with Non-IID Data
Local-Global Knowledge Distillation in Heterogeneous Federated Learning with Non-IID Data
Dezhong Yao
Wanning Pan
Yutong Dai
Yao Wan
Xiaofeng Ding
Hai Jin
Zheng Xu
Lichao Sun
FedML
603
61
0
30 Jun 2021
An Automated Knowledge Mining and Document Classification System with
  Multi-model Transfer Learning
An Automated Knowledge Mining and Document Classification System with Multi-model Transfer Learning
J. Chong
Zhiyuan Chen
Mei Shin Oh
91
2
0
24 Jun 2021
AT-BERT: Adversarial Training BERT for Acronym Identification Winning
  Solution for SDU@AAAI-21
AT-BERT: Adversarial Training BERT for Acronym Identification Winning Solution for SDU@AAAI-21
Danqing Zhu
Wangli Lin
Yang Zhang
Qiwei Zhong
Guanxiong Zeng
Weilin Wu
Jiayu Tang
276
19
0
11 Jan 2021
Fine-Tuning Pre-trained Language Model with Weak Supervision: A
  Contrastive-Regularized Self-Training Approach
Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach
Yue Yu
Simiao Zuo
Haoming Jiang
Wendi Ren
T. Zhao
Chao Zhang
AI4MH
453
133
0
15 Oct 2020
InfoMiner at WNUT-2020 Task 2: Transformer-based Covid-19 Informative
  Tweet Extraction
InfoMiner at WNUT-2020 Task 2: Transformer-based Covid-19 Informative Tweet Extraction
Hansi Hettiarachchi
Tharindu Ranasinghe
MedIm
157
21
0
11 Oct 2020
To BAN or not to BAN: Bayesian Attention Networks for Reliable Hate
  Speech Detection
To BAN or not to BAN: Bayesian Attention Networks for Reliable Hate Speech DetectionCognitive Computation (Cogn Comput), 2020
Kristian Miok
Blaž Škrlj
D. Zaharie
Marko Robnik-Šikonja
470
46
0
10 Jul 2020
1
Page 1 of 1