ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.14102
  4. Cited By
Exploring Mode Connectivity for Pre-trained Language Models

Exploring Mode Connectivity for Pre-trained Language Models

25 October 2022
Yujia Qin
Cheng Qian
Jing Yi
Weize Chen
Yankai Lin
Xu Han
Zhiyuan Liu
Maosong Sun
Jie Zhou
ArXivPDFHTML

Papers citing "Exploring Mode Connectivity for Pre-trained Language Models"

20 / 20 papers shown
Title
Understanding Machine Unlearning Through the Lens of Mode Connectivity
Understanding Machine Unlearning Through the Lens of Mode Connectivity
Jiali Cheng
Hadi Amiri
MU
75
0
0
08 Apr 2025
Training-Free Model Merging for Multi-target Domain Adaptation
Training-Free Model Merging for Multi-target Domain Adaptation
Wenyi Li
Huan-ang Gao
Mingju Gao
Beiwen Tian
Rong Zhi
Hao Zhao
MoMe
43
5
0
18 Jul 2024
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via
  Adaptive Heads Fusion
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Yilong Chen
Linhao Zhang
Junyuan Shang
Zhenyu Zhang
Tingwen Liu
Shuohuan Wang
Yu Sun
33
1
0
03 Jun 2024
Predicting the Impact of Model Expansion through the Minima Manifold: A
  Loss Landscape Perspective
Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective
Pranshu Malviya
Jerry Huang
Quentin Fournier
Sarath Chandar
54
0
0
24 May 2024
On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning
  Paradigm
On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning Paradigm
Zhanpeng Zhou
Zijun Chen
Yilan Chen
Bo-Wen Zhang
Junchi Yan
MoMe
19
9
0
06 Feb 2024
Merging by Matching Models in Task Parameter Subspaces
Merging by Matching Models in Task Parameter Subspaces
Derek Tam
Mohit Bansal
Colin Raffel
MoMe
19
10
0
07 Dec 2023
Language Models are Super Mario: Absorbing Abilities from Homologous
  Models as a Free Lunch
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu
Yu Bowen
Haiyang Yu
Fei Huang
Yongbin Li
MoMe
28
266
0
06 Nov 2023
Proving Linear Mode Connectivity of Neural Networks via Optimal
  Transport
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Damien Ferbach
Baptiste Goujaud
Gauthier Gidel
Aymeric Dieuleveut
MoMe
16
16
0
29 Oct 2023
Deep Model Fusion: A Survey
Deep Model Fusion: A Survey
Weishi Li
Yong Peng
Miao Zhang
Liang Ding
Han Hu
Li Shen
FedML
MoMe
21
51
0
27 Sep 2023
Composing Parameter-Efficient Modules with Arithmetic Operations
Composing Parameter-Efficient Modules with Arithmetic Operations
Jinghan Zhang
Shiqi Chen
Junteng Liu
Junxian He
KELM
MoMe
11
107
0
26 Jun 2023
Lookaround Optimizer: $k$ steps around, 1 step average
Lookaround Optimizer: kkk steps around, 1 step average
Jiangtao Zhang
Shunyu Liu
Jie Song
Tongtian Zhu
Zhenxing Xu
Mingli Song
MoMe
29
6
0
13 Jun 2023
Soft Merging of Experts with Adaptive Routing
Soft Merging of Experts with Adaptive Routing
Mohammed Muqeeth
Haokun Liu
Colin Raffel
MoMe
MoE
17
45
0
06 Jun 2023
Recyclable Tuning for Continual Pre-training
Recyclable Tuning for Continual Pre-training
Yujia Qin
Cheng Qian
Xu Han
Yankai Lin
Huadong Wang
Ruobing Xie
Zhiyuan Liu
Maosong Sun
Jie Zhou
CLL
13
11
0
15 May 2023
Knowledge is a Region in Weight Space for Fine-tuned Language Models
Knowledge is a Region in Weight Space for Fine-tuned Language Models
Almog Gueta
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
19
49
0
09 Feb 2023
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution
  Generalization
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization
Alexandre Ramé
Kartik Ahuja
Jianyu Zhang
Matthieu Cord
Léon Bottou
David Lopez-Paz
MoMe
OODD
24
80
0
20 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
MoMe
16
52
0
02 Dec 2022
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,835
0
18 Apr 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,808
0
14 Dec 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,584
0
21 Jan 2020
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
406
2,576
0
03 Sep 2019
1