ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.06297
  4. Cited By
Conditional Computation in Neural Networks for faster models

Conditional Computation in Neural Networks for faster models

19 November 2015
Emmanuel Bengio
Pierre-Luc Bacon
Joelle Pineau
Doina Precup
    AI4CE
ArXivPDFHTML

Papers citing "Conditional Computation in Neural Networks for faster models"

50 / 62 papers shown
Title
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Tam Minh Nguyen
Ngoc N. Tran
Khai Nguyen
Richard G. Baraniuk
MoE
59
0
0
01 May 2025
Neural network task specialization via domain constraining
Neural network task specialization via domain constraining
Roman Malashin
Daniil Ilyukhin
49
0
0
28 Apr 2025
Switch-Based Multi-Part Neural Network
Switch-Based Multi-Part Neural Network
Surajit Majumder
Paritosh Ranjan
Prodip Roy
Bhuban Padhan
OOD
74
0
0
25 Apr 2025
Learning to Inference Adaptively for Multimodal Large Language Models
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu
Khoi Duc Nguyen
Preeti Mukherjee
Saurabh Bagchi
Somali Chaterji
Yingyu Liang
Yin Li
LRM
44
1
0
13 Mar 2025
Tight Clusters Make Specialized Experts
Tight Clusters Make Specialized Experts
Stefan K. Nielsen
R. Teo
Laziz U. Abdullaev
Tan M. Nguyen
MoE
59
2
0
21 Feb 2025
Efficient Sparse Training with Structured Dropout
Efficient Sparse Training with Structured Dropout
Andy Lo
BDL
28
0
0
02 Nov 2024
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
Sagi Shaier
Francisco Pereira
K. Wense
Lawrence E Hunter
Matt Jones
MoE
46
0
0
10 Oct 2024
Video Relationship Detection Using Mixture of Experts
Video Relationship Detection Using Mixture of Experts
A. Shaabana
Zahra Gharaee
Paul Fieguth
30
0
0
06 Mar 2024
Machine learning and domain decomposition methods -- a survey
Machine learning and domain decomposition methods -- a survey
A. Klawonn
M. Lanser
J. Weber
AI4CE
16
7
0
21 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
33
31
0
08 Dec 2023
Enhancing Molecular Property Prediction via Mixture of Collaborative
  Experts
Enhancing Molecular Property Prediction via Mixture of Collaborative Experts
Xu Yao
Shuang Liang
Songqiao Han
Hailiang Huang
19
4
0
06 Dec 2023
SiRA: Sparse Mixture of Low Rank Adaptation
SiRA: Sparse Mixture of Low Rank Adaptation
Yun Zhu
Nevan Wichers
Chu-Cheng Lin
Xinyi Wang
Tianlong Chen
...
Han Lu
Canoee Liu
Liangchen Luo
Jindong Chen
Lei Meng
MoE
21
27
0
15 Nov 2023
From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
Learning When to Trust Which Teacher for Weakly Supervised ASR
Learning When to Trust Which Teacher for Weakly Supervised ASR
Aakriti Agrawal
Milind Rao
Anit Kumar Sahu
Gopinath Chennupati
A. Stolcke
14
0
0
21 Jun 2023
MetaGait: Learning to Learn an Omni Sample Adaptive Representation for
  Gait Recognition
MetaGait: Learning to Learn an Omni Sample Adaptive Representation for Gait Recognition
Huanzhang Dou
Pengyi Zhang
Wei Su
Yunlong Yu
Xi Li
CVBM
29
31
0
06 Jun 2023
Neural Markov Jump Processes
Neural Markov Jump Processes
Patrick Seifner
Ramses J. Sanchez
BDL
27
7
0
31 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
TIPS: Topologically Important Path Sampling for Anytime Neural Networks
TIPS: Topologically Important Path Sampling for Anytime Neural Networks
Guihong Li
Kartikeya Bhardwaj
Yuedong Yang
R. Marculescu
AAML
28
0
0
13 May 2023
Memorization Capacity of Neural Networks with Conditional Computation
Memorization Capacity of Neural Networks with Conditional Computation
Erdem Koyuncu
30
4
0
20 Mar 2023
I3D: Transformer architectures with input-dependent dynamic depth for
  speech recognition
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Yifan Peng
Jaesong Lee
Shinji Watanabe
22
19
0
14 Mar 2023
Towards Inference Efficient Deep Ensemble Learning
Towards Inference Efficient Deep Ensemble Learning
Ziyue Li
Kan Ren
Yifan Yang
Xinyang Jiang
Yuqing Yang
Dongsheng Li
BDL
21
12
0
29 Jan 2023
Spatial Mixture-of-Experts
Spatial Mixture-of-Experts
Nikoli Dryden
Torsten Hoefler
MoE
24
9
0
24 Nov 2022
A Survey for Efficient Open Domain Question Answering
A Survey for Efficient Open Domain Question Answering
Qin Zhang
Shan Chen
Dongkuan Xu
Qingqing Cao
Xiaojun Chen
Trevor Cohn
Meng Fang
26
33
0
15 Nov 2022
Neural Attentive Circuits
Neural Attentive Circuits
Nasim Rahaman
M. Weiß
Francesco Locatello
C. Pal
Yoshua Bengio
Bernhard Schölkopf
Erran L. Li
Nicolas Ballas
24
6
0
14 Oct 2022
Neural Routing in Meta Learning
Neural Routing in Meta Learning
Jicang Cai
Saeed Vahidian
Weijia Wang
M. Joneidi
Bill Lin
13
0
0
14 Oct 2022
A Survey of Neural Trees
A Survey of Neural Trees
Haoling Li
Jie Song
Mengqi Xue
Haofei Zhang
Jingwen Ye
Lechao Cheng
Mingli Song
AI4CE
13
6
0
07 Sep 2022
Neural Implicit Dictionary via Mixture-of-Expert Training
Neural Implicit Dictionary via Mixture-of-Expert Training
Peihao Wang
Zhiwen Fan
Tianlong Chen
Zhangyang Wang
17
12
0
08 Jul 2022
Switchable Representation Learning Framework with Self-compatibility
Switchable Representation Learning Framework with Self-compatibility
Shengsen Wu
Yan Bai
Yihang Lou
Xiongkun Linghu
Jianzhong He
Ling-yu Duan
22
1
0
16 Jun 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
126
349
0
21 May 2022
Triangular Dropout: Variable Network Width without Retraining
Triangular Dropout: Variable Network Width without Retraining
Edward W. Staley
Jared Markowitz
12
2
0
02 May 2022
Dynamic Multimodal Fusion
Dynamic Multimodal Fusion
Zihui Xue
R. Marculescu
34
47
0
31 Mar 2022
APG: Adaptive Parameter Generation Network for Click-Through Rate
  Prediction
APG: Adaptive Parameter Generation Network for Click-Through Rate Prediction
Bencheng Yan
Pengjie Wang
Kai Zhang
Feng Li
Hongbo Deng
Jian Xu
Bo Zheng
16
20
0
30 Mar 2022
Unified Scaling Laws for Routed Language Models
Unified Scaling Laws for Routed Language Models
Aidan Clark
Diego de Las Casas
Aurelia Guy
A. Mensch
Michela Paganini
...
Oriol Vinyals
Jack W. Rae
Erich Elsen
Koray Kavukcuoglu
Karen Simonyan
MoE
27
177
0
02 Feb 2022
Efficient Large Scale Language Modeling with Mixtures of Experts
Efficient Large Scale Language Modeling with Mixtures of Experts
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
...
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
MoE
50
188
0
20 Dec 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient
  Inference
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
106
0
24 Sep 2021
Mixed SIGNals: Sign Language Production via a Mixture of Motion
  Primitives
Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives
Ben Saunders
Necati Cihan Camgöz
Richard Bowden
SLR
19
50
0
23 Jul 2021
A Survey on Deep Learning Technique for Video Segmentation
A Survey on Deep Learning Technique for Video Segmentation
Tianfei Zhou
Fatih Porikli
David J. Crandall
Luc Van Gool
Wenguan Wang
VOS
20
231
0
02 Jul 2021
IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision
  Transformers
IA-RED2^22: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Rameswar Panda
Yifan Jiang
Zhangyang Wang
Rogerio Feris
A. Oliva
VLM
ViT
39
153
0
23 Jun 2021
Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
12
575
0
10 Jun 2021
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
Rameswar Panda
Chun-Fu Chen
Quanfu Fan
Ximeng Sun
Kate Saenko
A. Oliva
Rogerio Feris
28
47
0
11 May 2021
Shapley Explanation Networks
Shapley Explanation Networks
Rui Wang
Xiaoqian Wang
David I. Inouye
TDI
FAtt
17
44
0
06 Apr 2021
Contextual Dropout: An Efficient Sample-Dependent Dropout Module
Contextual Dropout: An Efficient Sample-Dependent Dropout Module
Xinjie Fan
Shujian Zhang
Korawat Tanwisuth
Xiaoning Qian
Mingyuan Zhou
OOD
BDL
UQCV
22
27
0
06 Mar 2021
VA-RED$^2$: Video Adaptive Redundancy Reduction
VA-RED2^22: Video Adaptive Redundancy Reduction
Bowen Pan
Rameswar Panda
Camilo Luciano Fosco
Chung-Ching Lin
A. Andonian
Yue Meng
Kate Saenko
A. Oliva
Rogerio Feris
15
19
0
15 Feb 2021
High-Capacity Expert Binary Networks
High-Capacity Expert Binary Networks
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
MQ
11
57
0
07 Oct 2020
Are Neural Nets Modular? Inspecting Functional Modularity Through
  Differentiable Weight Masks
Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks
Róbert Csordás
Sjoerd van Steenkiste
Jürgen Schmidhuber
21
87
0
05 Oct 2020
Multi-modal Experts Network for Autonomous Driving
Multi-modal Experts Network for Autonomous Driving
Shihong Fang
A. Choromańska
MoE
18
5
0
18 Sep 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Z. Chen
MoE
18
1,106
0
30 Jun 2020
Which scaling rule applies to Artificial Neural Networks
Which scaling rule applies to Artificial Neural Networks
János Végh
21
9
0
15 May 2020
Learning to Continually Learn
Learning to Continually Learn
Shawn L. E. Beaulieu
Lapo Frati
Thomas Miconi
Joel Lehman
Kenneth O. Stanley
Jeff Clune
Nick Cheney
KELM
CLL
19
146
0
21 Feb 2020
Attention over Parameters for Dialogue Systems
Attention over Parameters for Dialogue Systems
Andrea Madotto
Zhaojiang Lin
Chien-Sheng Wu
Jamin Shin
Pascale Fung
22
20
0
07 Jan 2020
12
Next