Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.05444
Cited By
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
11 September 2023
Ted Zadouri
A. Ustun
Arash Ahmadian
Beyza Ermics
Acyr F. Locatelli
Sara Hooker
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning"
30 / 80 papers shown
Title
Towards Incremental Learning in Large Language Models: A Critical Review
M. Jovanovic
Peter Voss
ELM
CLL
KELM
26
5
0
28 Apr 2024
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts
Yusheng Liao
Shuyang Jiang
Yu Wang
Yanfeng Wang
MoE
34
5
0
13 Apr 2024
Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning
Yijiang Liu
Rongyu Zhang
Huanrui Yang
Kurt Keutzer
Yuan Du
Li Du
Shanghang Zhang
MoE
36
6
0
13 Apr 2024
SilverSight: A Multi-Task Chinese Financial Large Language Model Based on Adaptive Semantic Space Learning
Yuhang Zhou
Zeping Li
Siyu Tian
Yuchen Ni
Sen Liu
Guangnan Ye
Hongfeng Chai
38
1
0
07 Apr 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
139
305
0
21 Mar 2024
Conditional computation in neural networks: principles and research trends
Simone Scardapane
Alessandro Baiocchi
Alessio Devoto
V. Marsocci
Pasquale Minervini
Jary Pomponi
34
1
0
12 Mar 2024
Online Adaptation of Language Models with a Memory of Amortized Contexts
Jihoon Tack
Jaehyung Kim
Eric Mitchell
Jinwoo Shin
Yee Whye Teh
Jonathan Richard Schwarz
KELM
40
18
0
07 Mar 2024
Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence
Weixiang Zhao
Zhuojun Li
Shilong Wang
Yang Wang
Yulin Hu
Yanyan Zhao
Chen Wei
Bing Qin
17
4
0
15 Feb 2024
LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild
Ziyu Zhao
Leilei Gan
Guoyin Wang
Wangchunshu Zhou
Hongxia Yang
Kun Kuang
Fei Wu
MoMe
21
28
0
15 Feb 2024
Higher Layers Need More LoRA Experts
Chongyang Gao
Kezhen Chen
Jinmeng Rao
Baochen Sun
Ruibo Liu
Daiyi Peng
Yawen Zhang
Xiaoyuan Guo
Jie Yang
V. Subrahmanian
MoE
13
37
0
13 Feb 2024
Learning to Route Among Specialized Experts for Zero-Shot Generalization
Mohammed Muqeeth
Haokun Liu
Yufan Liu
Colin Raffel
MoMe
37
33
0
08 Feb 2024
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters
Umberto Cappellazzo
Daniele Falavigna
A. Brutti
MoE
33
2
0
01 Feb 2024
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs
Shaoxiang Chen
Zequn Jie
Lin Ma
MoE
38
46
0
29 Jan 2024
OrchMoE: Efficient Multi-Adapter Learning with Task-Skill Synergy
Haowen Wang
Tao Sun
Kaixiang Ji
Jian Wang
Cong Fan
Jinjie Gu
16
1
0
19 Jan 2024
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
Yunhao Gou
Zhili Liu
Kai Chen
Lanqing Hong
Hang Xu
Aoxue Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MoE
MLLM
VLM
34
62
0
19 Dec 2023
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
79
46
0
18 Dec 2023
Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference
Bartosz Wójcik
Alessio Devoto
Karol Pustelnik
Pasquale Minervini
Simone Scardapane
15
5
0
15 Dec 2023
MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning
Qizhe Zhang
Bocheng Zou
Ruichuan An
Jiaming Liu
Shanghang Zhang
MoE
20
2
0
05 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
14
14
0
01 Dec 2023
SiRA: Sparse Mixture of Low Rank Adaptation
Yun Zhu
Nevan Wichers
Chu-Cheng Lin
Xinyi Wang
Tianlong Chen
...
Han Lu
Canoee Liu
Liangchen Luo
Jindong Chen
Lei Meng
MoE
19
27
0
15 Nov 2023
When MOE Meets LLMs: Parameter Efficient Fine-tuning for Multi-task Medical Applications
Qidong Liu
Xian Wu
Xiangyu Zhao
Yuanshao Zhu
Derong Xu
Feng Tian
Yefeng Zheng
MoE
34
62
0
21 Oct 2023
Label Supervised LLaMA Finetuning
Zongxi Li
Xianming Li
Yuzhang Liu
Haoran Xie
Jing Li
F. Wang
Qing Li
Xiaoqin Zhong
ALM
15
21
0
02 Oct 2023
ConPET: Continual Parameter-Efficient Tuning for Large Language Models
Chenyan Song
Xu Han
Zheni Zeng
Kuai Li
Chen Chen
Zhiyuan Liu
Maosong Sun
Taojiannan Yang
CLL
KELM
14
9
0
26 Sep 2023
Soft Merging of Experts with Adaptive Routing
Mohammed Muqeeth
Haokun Liu
Colin Raffel
MoMe
MoE
24
45
0
06 Jun 2023
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
149
326
0
18 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,448
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
211
1,656
0
15 Oct 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
105
0
24 Sep 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,843
0
18 Apr 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
Previous
1
2