Efficient softmax approximation for GPUs

14 September 2016

Papers citing "Efficient softmax approximation for GPUs"

50 / 151 papers shown

Title
Scaling Up Collaborative Filtering Data Sets through Randomized Fractal Expansions Francois Belletti K. Lakshmanan Walid Krichene Nicolas Mayoraz Yi-Fan Chen John R. Anderson Taylor Robie Tayo Oguntebi Dan Shirron Amit Bleiwess 37 5 0 08 Apr 2019
Modeling Vocabulary for Big Code Machine Learning Hlib Babii Andrea Janes Romain Robbes 19 22 0 03 Apr 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling Myle Ott Sergey Edunov Alexei Baevski Angela Fan Sam Gross Nathan Ng David Grangier Michael Auli VLM FaML 23 3,130 0 01 Apr 2019
Cloze-driven Pretraining of Self-attention Networks Alexei Baevski Sergey Edunov Yinhan Liu Luke Zettlemoyer Michael Auli 10 198 0 19 Mar 2019
Maybe Deep Neural Networks are the Best Choice for Modeling Source Code Rafael-Michael Karampatsis Charles Sutton 32 54 0 13 Mar 2019
Efficient Contextual Representation Learning Without Softmax Layer Liunian Harold Li Patrick H. Chen Cho-Jui Hsieh Kai-Wei Chang 26 6 0 28 Feb 2019
Compressing Gradient Optimizers via Count-Sketches Ryan Spring Anastasios Kyrillidis Vijai Mohan Anshumali Shrivastava 14 35 0 01 Feb 2019
Doubly Sparse: Sparse Mixture of Sparse Experts for Efficient Softmax Inference Shun Liao Ting Chen Tian Lin Denny Zhou Chong-Jun Wang MoE 7 2 0 30 Jan 2019
Pay Less Attention with Lightweight and Dynamic Convolutions Felix Wu Angela Fan Alexei Baevski Yann N. Dauphin Michael Auli 11 604 0 29 Jan 2019
Error-Correcting Neural Sequence Prediction James OÑeill Danushka Bollegala 23 1 0 21 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai Zhilin Yang Yiming Yang J. Carbonell Quoc V. Le Ruslan Salakhutdinov VLM 38 3,679 0 09 Jan 2019
Attention-based sequence-to-sequence model for speech recognition: development of state-of-the-art system on LibriSpeech and its application to non-native English Yan Yin R. Prieto Bin Wang Jianwei Zhou Yiwei Gu Yang Liu Hui-Ching Lin 7 2 0 31 Oct 2018
Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks Patrick H. Chen Si Si Sanjiv Kumar Yang Li Cho-Jui Hsieh 16 21 0 29 Oct 2018
A no-regret generalization of hierarchical softmax to extreme multi-label classification Marek Wydmuch Kalina Jasinska Mikhail Kuznetsov R. Busa-Fekete Krzysztof Dembczyñski 24 100 0 27 Oct 2018
Real-time Neural-based Input Method Jiali Yao Raphael Shu Xinjian Li K. Ohtsuki Hideki Nakayama 11 4 0 19 Oct 2018
Trellis Networks for Sequence Modeling Shaojie Bai J. Zico Kolter V. Koltun 25 145 0 15 Oct 2018
Adaptive Input Representations for Neural Language Modeling Alexei Baevski Michael Auli 26 388 0 28 Sep 2018
Adaptive Pruning of Neural Language Models for Mobile Devices Raphael Tang Jimmy J. Lin 21 6 0 27 Sep 2018
Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation X. Kong Qizhe Xie Zihang Dai Eduard H. Hovy 24 2 0 25 Sep 2018
Hard Non-Monotonic Attention for Character-Level Transduction Shijie Wu Pamela Shapiro Ryan Cotterell 8 42 0 29 Aug 2018
Improved training of neural trans-dimensional random field language models with dynamic noise-contrastive estimation Bin Wang Zhijian Ou 25 14 0 03 Jul 2018
Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR Yerbolat Khassanov Chng Eng Siong KELM 29 5 0 27 Jun 2018
Sigsoftmax: Reanalysis of the Softmax Bottleneck Sekitoshi Kanai Yasuhiro Fujiwara Yuki Yamanaka S. Adachi 19 68 0 28 May 2018
Learning to Write with Cooperative Discriminators Ari Holtzman Jan Buys Maxwell Forbes Antoine Bosselut David Golub Yejin Choi 31 234 0 16 May 2018
Adversarial Contrastive Estimation A. Bose Huan Ling Yanshuai Cao 13 56 0 09 May 2018
Interpretable Adversarial Perturbation in Input Embedding Space for Text Motoki Sato Jun Suzuki Hiroyuki Shindo Yuji Matsumoto 21 188 0 08 May 2018
Online normalizer calculation for softmax Maxim Milakov N. Gimelshein 27 84 0 08 May 2018
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling Liyuan Liu Xiang Ren Jingbo Shang Jian-wei Peng Jiawei Han 25 44 0 20 Apr 2018
Lightweight Adaptive Mixture of Neural and N-gram Language Models A. Bakhtin Arthur Szlam MarcÁurelio Ranzato Edouard Grave 20 11 0 20 Apr 2018
Fast Parametric Learning with Activation Memorization Jack W. Rae Chris Dyer Peter Dayan Timothy Lillicrap KELM 41 46 0 27 Mar 2018
Unbiased scalable softmax optimization Francois Fagan G. Iyengar 6 12 0 22 Mar 2018
An Analysis of Neural Language Modeling at Multiple Scales Stephen Merity N. Keskar R. Socher 24 170 0 22 Mar 2018
Augment and Reduce: Stochastic Inference for Large Categorical Distributions Francisco J. R. Ruiz Michalis K. Titsias Adji Bousso Dieng David M. Blei BDL 19 22 0 12 Feb 2018
Accelerated Training for Massive Classification via Dynamic Class Selection Xingcheng Zhang Lei Yang Junjie Yan Dahua Lin 33 41 0 05 Jan 2018
Topic Compositional Neural Language Model Wenlin Wang Zhe Gan Wenqi Wang Dinghan Shen Jiaji Huang Ming-Yu Liu S. Satheesh Lawrence Carin 11 81 0 28 Dec 2017
Cavs: A Vertex-centric Programming Interface for Dynamic Neural Networks Huan Zhang Shizhen Xu Graham Neubig Wei-Ming Dai Qirong Ho Guangwen Yang Eric Xing GNN 28 3 0 11 Dec 2017
Adaptive Sampled Softmax with Kernel Based Sampling Guy Blanc Steffen Rendle BDL 14 73 0 02 Dec 2017
Slim Embedding Layers for Recurrent Neural Language Models Zhongliang Li Raymond Kulhanek Shaojun Wang Yunxin Zhao Shuang Wu KELM 27 23 0 27 Nov 2017
Unbounded cache model for online language modeling with open vocabulary Edouard Grave Moustapha Cissé Armand Joulin KELM CLL 18 62 0 07 Nov 2017
Self-organized Hierarchical Softmax Songlin Yang Shawn Tan C. Pal Aaron Courville BDL 38 7 0 26 Jul 2017
Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones Z. Assylbekov Rustem Takhanov Bagdat Myrzakhmetov Jonathan North Washington 38 17 0 20 Jul 2017
Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies Seppo Enarvi Peter Smit Sami Virpioja M. Kurimo 23 37 0 13 Jul 2017
TAPAS: Two-pass Approximate Adaptive Sampling for Softmax Yu Bai S. Goldman Li Zhang TPM 16 15 0 10 Jul 2017
Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks Joan Serrà Alexandros Karatzoglou 28 52 0 13 Jun 2017
Fast Single-Class Classification and the Principle of Logit Separation Gil Keren Sivan Sabato Björn Schuller 21 6 0 29 May 2017
Autoregressive Convolutional Neural Networks for Asynchronous Time Series Mikolaj Binkowski Gautier Marti Philippe Donnat AI4TS BDL 43 149 0 12 Mar 2017
Language Modeling with Gated Convolutional Networks Yann N. Dauphin Angela Fan Michael Auli David Grangier 80 2,364 0 23 Dec 2016
Improving Neural Language Models with a Continuous Cache Edouard Grave Armand Joulin Nicolas Usunier KELM 11 300 0 13 Dec 2016
FastText.zip: Compressing text classification models Armand Joulin Edouard Grave Piotr Bojanowski Matthijs Douze Hervé Jégou Tomáš Mikolov MQ 25 1,190 0 12 Dec 2016
Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation Yacine Jernite A. Choromańska David Sontag 25 35 0 14 Oct 2016