ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04309
  4. Cited By
Efficient softmax approximation for GPUs

Efficient softmax approximation for GPUs

14 September 2016
Edouard Grave
Armand Joulin
Moustapha Cissé
David Grangier
Hervé Jégou
ArXivPDFHTML

Papers citing "Efficient softmax approximation for GPUs"

50 / 151 papers shown
Title
Probabilistic Label Trees for Extreme Multi-label Classification
Probabilistic Label Trees for Extreme Multi-label Classification
Kalina Jasinska-Kobus
Marek Wydmuch
Krzysztof Dembczyñski
Mikhail Kuznetsov
R. Busa-Fekete
20
18
0
23 Sep 2020
F^2-Softmax: Diversifying Neural Text Generation via Frequency
  Factorized Softmax
F^2-Softmax: Diversifying Neural Text Generation via Frequency Factorized Softmax
Byung-Ju Choi
Jimin Hong
D. Park
Sang Wan Lee
11
14
0
20 Sep 2020
A Survey on Large-scale Machine Learning
A Survey on Large-scale Machine Learning
Meng Wang
Weijie Fu
Xiangnan He
Shijie Hao
Xindong Wu
22
109
0
10 Aug 2020
DeLighT: Deep and Light-weight Transformer
DeLighT: Deep and Light-weight Transformer
Sachin Mehta
Marjan Ghazvininejad
Srini Iyer
Luke Zettlemoyer
Hannaneh Hajishirzi
VLM
33
32
0
03 Aug 2020
OccamNet: A Fast Neural Model for Symbolic Regression at Scale
OccamNet: A Fast Neural Model for Symbolic Regression at Scale
Owen Dugan
Rumen Dangovski
Allan dos Santos Costa
Samuel Kim
Pawan Goyal
J. Jacobson
M. Soljavcić
26
11
0
16 Jul 2020
Pretrained Generalized Autoregressive Model with Adaptive Probabilistic
  Label Clusters for Extreme Multi-label Text Classification
Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification
Hui Ye
Zhiyu Zoey Chen
Da-han Wang
Brian D. Davison
VLM
16
51
0
05 Jul 2020
Latent Video Transformer
Latent Video Transformer
Ruslan Rakhimov
Denis Volkhonskiy
Alexey Artemov
Denis Zorin
Evgeny Burnaev
VGen
33
119
0
18 Jun 2020
A Comprehensive Survey on Aspect Based Sentiment Analysis
A Comprehensive Survey on Aspect Based Sentiment Analysis
K. Yadav
15
1
0
08 Jun 2020
Continual Representation Learning for Biometric Identification
Continual Representation Learning for Biometric Identification
Bo Zhao
Shixiang Tang
Dapeng Chen
Hakan Bilen
Rui Zhao
CLL
22
34
0
08 Jun 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language
  Processing
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
46
257
0
28 May 2020
Understanding Contrastive Representation Learning through Alignment and
  Uniformity on the Hypersphere
Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere
Tongzhou Wang
Phillip Isola
SSL
45
1,787
0
20 May 2020
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network
  Language Model
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model
Da-Rong Liu
Chunxi Liu
Frank Zhang
Gabriel Synnaeve
Yatharth Saraf
Geoffrey Zweig
28
19
0
15 May 2020
Stolen Probability: A Structural Weakness of Neural Language Models
Stolen Probability: A Structural Weakness of Neural Language Models
David Demeter
Gregory J. Kimmel
Doug Downey
17
32
0
05 May 2020
Language Models as an Alternative Evaluator of Word Order Hypotheses: A
  Case Study in Japanese
Language Models as an Alternative Evaluator of Word Order Hypotheses: A Case Study in Japanese
Tatsuki Kuribayashi
Takumi Ito
Jun Suzuki
Kentaro Inui
9
5
0
02 May 2020
Augmenting Transformers with KNN-Based Composite Memory for Dialogue
Augmenting Transformers with KNN-Based Composite Memory for Dialogue
Angela Fan
Claire Gardent
Chloé Braud
Antoine Bordes
RALM
47
75
0
27 Apr 2020
Lite Transformer with Long-Short Range Attention
Lite Transformer with Long-Short Range Attention
Zhanghao Wu
Zhijian Liu
Ji Lin
Yujun Lin
Song Han
23
318
0
24 Apr 2020
Doubly-stochastic mining for heterogeneous retrieval
Doubly-stochastic mining for heterogeneous retrieval
A. S. Rawat
A. Menon
Andreas Veit
Felix X. Yu
Sashank J. Reddi
Sanjiv Kumar
17
5
0
23 Apr 2020
Transform and Tell: Entity-Aware News Image Captioning
Transform and Tell: Entity-Aware News Image Captioning
Alasdair Tran
A. Mathews
Lexing Xie
VLM
17
95
0
17 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
24
242
0
15 Apr 2020
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
Matej Martinc
Blaž Škrlj
Senja Pollak
24
37
0
20 Mar 2020
The Implicit and Explicit Regularization Effects of Dropout
The Implicit and Explicit Regularization Effects of Dropout
Colin Wei
Sham Kakade
Tengyu Ma
30
114
0
28 Feb 2020
Addressing Some Limitations of Transformers with Feedback Memory
Addressing Some Limitations of Transformers with Feedback Memory
Angela Fan
Thibaut Lavril
Edouard Grave
Armand Joulin
Sainbayar Sukhbaatar
26
11
0
21 Feb 2020
Integrating Discrete and Neural Features via Mixed-feature
  Trans-dimensional Random Field Language Models
Integrating Discrete and Neural Features via Mixed-feature Trans-dimensional Random Field Language Models
Silin Gao
Zhijian Ou
Wei Yang
Huifang Xu
8
1
0
14 Feb 2020
Pre-training Tasks for Embedding-based Large-scale Retrieval
Pre-training Tasks for Embedding-based Large-scale Retrieval
Wei-Cheng Chang
Felix X. Yu
Yin-Wen Chang
Yiming Yang
Sanjiv Kumar
RALM
13
301
0
10 Feb 2020
Normalization of Input-output Shared Embeddings in Text Generation
  Models
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu
Yujia Zhai
Zizhong Chen
25
0
0
22 Jan 2020
The Two-Pass Softmax Algorithm
The Two-Pass Softmax Algorithm
Marat Dukhan
Artsiom Ablavatski
TPM
11
8
0
13 Jan 2020
Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
  Generation
Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation
Xinjie Fan
Yizhe Zhang
Zhendong Wang
Mingyuan Zhou
BDL
9
4
0
31 Dec 2019
Deconstructing and reconstructing word embedding algorithms
Deconstructing and reconstructing word embedding algorithms
Edward Newell
Kian Kenyon-Dean
Jackie C.K. Cheung
39
4
0
29 Nov 2019
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence
  Modeling
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
AI4TS
38
23
0
27 Nov 2019
Word Embedding Algorithms as Generalized Low Rank Models and their
  Canonical Form
Word Embedding Algorithms as Generalized Low Rank Models and their Canonical Form
Kian Kenyon-Dean
27
3
0
06 Nov 2019
Unsupervised Cross-lingual Representation Learning at Scale
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
23
6,385
0
05 Nov 2019
Generalization through Memorization: Nearest Neighbor Language Models
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
59
817
0
01 Nov 2019
An Empirical Study of Efficient ASR Rescoring with Transformers
An Empirical Study of Efficient ASR Rescoring with Transformers
Hongzhao Huang
Fuchun Peng
KELM
14
22
0
24 Oct 2019
Structured Pruning of Large Language Models
Structured Pruning of Large Language Models
Ziheng Wang
Jeremy Wohlwend
Tao Lei
24
281
0
10 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
109
6,377
0
26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
43
584
0
25 Sep 2019
A Random Gossip BMUF Process for Neural Language Modeling
A Random Gossip BMUF Process for Neural Language Modeling
Yiheng Huang
Jinchuan Tian
Lei Han
Guangsen Wang
Xingcheng Song
Dan Su
Dong Yu
9
3
0
19 Sep 2019
Towards Understanding Neural Machine Translation with Word Importance
Towards Understanding Neural Machine Translation with Word Importance
Shilin He
Zhaopeng Tu
Xing Wang
Longyue Wang
Michael R. Lyu
Shuming Shi
AAML
20
39
0
01 Sep 2019
Latent Relation Language Models
Latent Relation Language Models
Hiroaki Hayashi
Zecong Hu
Chenyan Xiong
Graham Neubig
KELM
29
42
0
21 Aug 2019
Softmax Dissection: Towards Understanding Intra- and Inter-class
  Objective for Embedding Learning
Softmax Dissection: Towards Understanding Intra- and Inter-class Objective for Embedding Learning
Lanqing He
Zhongdao Wang
Yali Li
Shengjin Wang
23
32
0
04 Aug 2019
Adaptive Noise Injection: A Structure-Expanding Regularization for RNN
Rui Li
Kai Shuang
Mengyu Gu
Sen Su
17
0
0
25 Jul 2019
Sampled Softmax with Random Fourier Features
Sampled Softmax with Random Fourier Features
A. S. Rawat
Jiecao Chen
Felix X. Yu
A. Suresh
Sanjiv Kumar
39
55
0
24 Jul 2019
Augmenting Self-attention with Persistent Memory
Augmenting Self-attention with Persistent Memory
Sainbayar Sukhbaatar
Edouard Grave
Guillaume Lample
Hervé Jégou
Armand Joulin
RALM
KELM
21
135
0
02 Jul 2019
Improving Neural Language Modeling via Adversarial Training
Improving Neural Language Modeling via Adversarial Training
Dilin Wang
Chengyue Gong
Qiang Liu
AAML
43
115
0
10 Jun 2019
On the computational complexity of the probabilistic label tree
  algorithms
On the computational complexity of the probabilistic label tree algorithms
R. Busa-Fekete
Krzysztof Dembczyñski
Alexander Golovnev
Kalina Jasinska
Mikhail Kuznetsov
M. Sviridenko
Chao Xu
TPM
29
3
0
01 Jun 2019
Interpretable Adversarial Training for Text
Interpretable Adversarial Training for Text
Samuel Barham
S. Feizi
AAML
21
17
0
30 May 2019
Using Ontologies To Improve Performance In Massively Multi-label
  Prediction Models
Using Ontologies To Improve Performance In Massively Multi-label Prediction Models
E. Steinberg
Peter J. Liu
NoLa
21
4
0
28 May 2019
Deep Residual Output Layers for Neural Language Generation
Deep Residual Output Layers for Neural Language Generation
Nikolaos Pappas
James Henderson
26
7
0
14 May 2019
Dynamic Evaluation of Transformer Language Models
Dynamic Evaluation of Transformer Language Models
Ben Krause
Emmanuel Kahembwe
Iain Murray
Steve Renals
21
42
0
17 Apr 2019
Who Needs Words? Lexicon-Free Speech Recognition
Who Needs Words? Lexicon-Free Speech Recognition
Tatiana Likhomanenko
Gabriel Synnaeve
R. Collobert
8
27
0
09 Apr 2019
Previous
1234
Next