Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

23 January 2017

Papers citing "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"

50 / 495 papers shown

Title
The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research N. Ahmed Muntasir Wahed 23 106 0 22 Oct 2020
Anti-Distillation: Improving reproducibility of deep networks G. Shamir Lorenzo Coviello 42 20 0 19 Oct 2020
THIN: THrowable Information Networks and Application for Facial Expression Recognition In The Wild Estèphe Arnaud Arnaud Dapogny Kévin Bailly CVBM 29 23 0 15 Oct 2020
High-Capacity Expert Binary Networks Adrian Bulat Brais Martínez Georgios Tzimiropoulos MQ 27 57 0 07 Oct 2020
Scalable Transfer Learning with Expert Models J. Puigcerver C. Riquelme Basil Mustafa Cédric Renggli André Susano Pinto Sylvain Gelly Daniel Keysers N. Houlsby 34 62 0 28 Sep 2020
Pruning Convolutional Filters using Batch Bridgeout Najeeb Khan Ian Stavness 23 3 0 23 Sep 2020
Multi-modal Experts Network for Autonomous Driving Shihong Fang A. Choromańska MoE 20 5 0 18 Sep 2020
Anomaly Detection by Recombining Gated Unsupervised Experts Jan-Philipp Schulze Philip Sperl Konstantin Böttinger 29 1 0 31 Aug 2020
S2RMs: Spatially Structured Recurrent Modules Nasim Rahaman Anirudh Goyal Muhammad Waleed Gondal M. Wuthrich Stefan Bauer Yash Sharma Yoshua Bengio Bernhard Schölkopf 21 14 0 13 Jul 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding Dmitry Lepikhin HyoukJoong Lee Yuanzhong Xu Dehao Chen Orhan Firat Yanping Huang M. Krikun Noam M. Shazeer Z. Chen MoE 25 1,106 0 30 Jun 2020
Attention-based Quantum Tomography Peter Cha P. Ginsparg Felix Wu Juan Carrasquilla Peter L. McMahon Eun-Ah Kim 26 72 0 22 Jun 2020
The Depth-to-Width Interplay in Self-Attention Yoav Levine Noam Wies Or Sharir Hofit Bata Amnon Shashua 30 45 0 22 Jun 2020
Learning to Branch for Multi-Task Learning Pengsheng Guo Chen-Yu Lee Daniel Ulbricht 18 174 0 02 Jun 2020
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 35 40,023 0 28 May 2020
Consistent and Flexible Selectivity Estimation for High-Dimensional Data Yaoshu Wang Chuan Xiao Jianbin Qin Rui Mao Onizuka Makoto Wei Wang Rui Zhang Yoshiharu Ishikawa 33 14 0 20 May 2020
TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain Weitao Li Pengfei Xu Yang Katie Zhao Haitong Li Yuan Xie Yingyan Lin 9 68 0 03 May 2020
Computation on Sparse Neural Networks: an Inspiration for Future Hardware Fei Sun Minghai Qin Tianyun Zhang Liu Liu Yen-kuang Chen Yuan Xie 29 7 0 24 Apr 2020
Conditional Channel Gated Networks for Task-Aware Continual Learning Davide Abati Jakub M. Tomczak Tijmen Blankevoort Simone Calderara Rita Cucchiara B. Bejnordi CLL 33 184 0 31 Mar 2020
Learning Dynamic Routing for Semantic Segmentation Yanwei Li Lin Song Yukang Chen Zeming Li Xinming Zhang Xingang Wang Jian Sun SSeg 88 163 0 23 Mar 2020
Resolution Adaptive Networks for Efficient Inference Le Yang Yizeng Han Xi Chen Shiji Song Jifeng Dai Gao Huang 24 215 0 16 Mar 2020
Sparse Sinkhorn Attention Yi Tay Dara Bahri Liu Yang Donald Metzler Da-Cheng Juan 23 330 0 26 Feb 2020
Learning to Continually Learn Shawn L. E. Beaulieu Lapo Frati Thomas Miconi Joel Lehman Kenneth O. Stanley Jeff Clune Nick Cheney KELM CLL 46 146 0 21 Feb 2020
A Survey of Deep Learning Techniques for Neural Machine Translation Shu Yang Yuxin Wang X. Chu VLM AI4TS AI4CE 22 138 0 18 Feb 2020
Adversarial Robustness for Code Pavol Bielik Martin Vechev AAML 16 89 0 11 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts Max Ryabinin Anton I. Gusev FedML 22 48 0 10 Feb 2020
Multi-site fMRI Analysis Using Privacy-preserving Federated Learning and Domain Adaptation: ABIDE Results Xiaoxiao Li Yufeng Gu Nicha Dvornek Lawrence H. Staib P. Ventola James S. Duncan FedML OOD 10 352 0 16 Jan 2020
Attention over Parameters for Dialogue Systems Andrea Madotto Zhaojiang Lin Chien-Sheng Wu Jamin Shin Pascale Fung 30 20 0 07 Jan 2020
A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning Soochan Lee Junsoo Ha Dongsu Zhang Gunhee Kim BDL CLL 11 209 0 03 Jan 2020
Machine Unlearning Lucas Bourtoule Varun Chandrasekaran Christopher A. Choquette-Choo Hengrui Jia Adelin Travers Baiwu Zhang David Lie Nicolas Papernot MU 27 807 0 09 Dec 2019
AdaFilter: Adaptive Filter Fine-tuning for Deep Transfer Learning Yunhui Guo Yandong Li Liqiang Wang Tajana Simunic 35 41 0 21 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 88 19,440 0 23 Oct 2019
Tree-gated Deep Mixture-of-Experts For Pose-robust Face Alignment Estèphe Arnaud Arnaud Dapogny Kévin Bailly CVBM 27 10 0 21 Oct 2019
Span Selection Pre-training for Question Answering Michael R. Glass A. Gliozzo Rishav Chakravarti Anthony Ferritto Lin Pan G P Shrivatsa Bhargav Dinesh Garg Avirup Sil RALM 38 70 0 09 Sep 2019
Situational Fusion of Visual Representation for Visual Navigation Bokui (William) Shen Danfei Xu Yuke Zhu Leonidas J. Guibas Fei-Fei Li Silvio Savarese SSL 24 62 0 24 Aug 2019
Deep Learning Based Chatbot Models Richard Csaky 29 46 0 23 Aug 2019
Convergence Rates for Gaussian Mixtures of Experts Nhat Ho Chiao-Yu Yang Michael I. Jordan 21 40 0 09 Jul 2019
4K-Memristor Analog-Grade Passive Crossbar Circuit Hyungjin Kim H. Nili Mahmood Mahmoodi D. Strukov 9 165 0 27 Jun 2019
Conditional Computation for Continual Learning Min-Bin Lin Jie Fu Yoshua Bengio CLL 18 10 0 16 Jun 2019
Lightweight Network Architecture for Real-Time Action Recognition Alexander Kozlov Vadim Andronov Y. Gritsenko ViT 25 33 0 21 May 2019
Priority-based Parameter Propagation for Distributed DNN Training Anand Jayarajan Jinliang Wei Garth A. Gibson Alexandra Fedorova Gennady Pekhimenko AI4CE 19 178 0 10 May 2019
Routing Networks and the Challenges of Modular and Compositional Computation Clemens Rosenbaum Ignacio Cases Matthew D Riemer Tim Klinger 37 78 0 29 Apr 2019
Ray Interference: a Source of Plateaus in Deep Reinforcement Learning Tom Schaul Diana Borsa Joseph Modayil Razvan Pascanu 11 63 0 25 Apr 2019
Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning Dimitrije Jankov Shangyu Luo Binhang Yuan Zhuhua Cai Jia Zou C. Jermaine Zekai J. Gao 23 60 0 25 Apr 2019
Model Slicing for Supporting Complex Analytics with Elastic Inference Cost and Resource Constraints Shaofeng Cai Gang Chen Beng Chin Ooi Jinyang Gao 25 19 0 03 Apr 2019
Many Task Learning with Task Routing Gjorgji Strezoski Nanne van Noord M. Worring MoE 24 96 0 28 Mar 2019
Continual Learning via Neural Pruning Siavash Golkar Michael Kagan Kyunghyun Cho CLL 21 158 0 11 Mar 2019
Mixture Models for Diverse Machine Translation: Tricks of the Trade T. Shen Myle Ott Michael Auli MarcÁurelio Ranzato MoE 33 148 0 20 Feb 2019
Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems Victor Makarenkov Lior Rokach Bracha Shapira 16 35 0 08 Jan 2019
The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos Hazel Doughty W. Mayol-Cuevas Dima Damen 30 138 0 13 Dec 2018
Deep Positron: A Deep Neural Network Using the Posit Number System Zachariah Carmichael Seyed Hamed Fatemi Langroudi Char Khazanov Jeffrey Lillie J. Gustafson Dhireesha Kudithipudi MQ 9 96 0 05 Dec 2018