ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04309
  4. Cited By
Efficient softmax approximation for GPUs

Efficient softmax approximation for GPUs

14 September 2016
Edouard Grave
Armand Joulin
Moustapha Cissé
David Grangier
Hervé Jégou
ArXivPDFHTML

Papers citing "Efficient softmax approximation for GPUs"

50 / 56 papers shown
Title
Compact Recurrent Transformer with Persistent Memory
Compact Recurrent Transformer with Persistent Memory
Edison Mucllari
Z. Daniels
David C. Zhang
Qiang Ye
CLL
VLM
54
0
0
02 May 2025
Dynamic Embedded Topic Models: properties and recommendations based on diverse corpora
Dynamic Embedded Topic Models: properties and recommendations based on diverse corpora
Elisabeth Fittschen
Bella Xia
Leib Celnik
Paul Dilley
Tom Lippincott
49
0
0
27 Apr 2025
Scaling Embedding Layers in Language Models
Scaling Embedding Layers in Language Models
Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang
82
0
0
03 Feb 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
67
0
0
02 Feb 2025
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
Marco Cognetta
Tatsuya Hiraoka
Naoaki Okazaki
Rico Sennrich
Yuval Pinter
34
2
0
30 Mar 2024
MobileNMT: Enabling Translation in 15MB and 30ms
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
35
1
0
07 Jun 2023
Why do Nearest Neighbor Language Models Work?
Why do Nearest Neighbor Language Models Work?
Frank F. Xu
Uri Alon
Graham Neubig
RALM
30
21
0
07 Jan 2023
Trajectory-User Linking Is Easier Than You Think
Trajectory-User Linking Is Easier Than You Think
Alameen Najjar
K. Mede
29
3
0
14 Dec 2022
Meta-Learning Fast Weight Language Models
Meta-Learning Fast Weight Language Models
Kevin Clark
Kelvin Guu
Ming-Wei Chang
Panupong Pasupat
Geoffrey E. Hinton
Mohammad Norouzi
KELM
32
13
0
05 Dec 2022
Nonparametric Masked Language Modeling
Nonparametric Masked Language Modeling
Sewon Min
Weijia Shi
M. Lewis
Xilun Chen
Wen-tau Yih
Hannaneh Hajishirzi
Luke Zettlemoyer
RALM
50
48
0
02 Dec 2022
SPOT: Knowledge-Enhanced Language Representations for Information
  Extraction
SPOT: Knowledge-Enhanced Language Representations for Information Extraction
Jiacheng Li
Yannis Katsis
Tyler Baldwin
Ho-Cheol Kim
Andrew Bartko
Julian McAuley
Chun-Nan Hsu
30
15
0
20 Aug 2022
Stable Invariant Models via Koopman Spectra
Stable Invariant Models via Koopman Spectra
Takuya Konishi
Yoshinobu Kawahara
23
3
0
15 Jul 2022
A practical framework for multi-domain speech recognition and an
  instance sampling method to neural language modeling
A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling
Yike Zhang
Xiaobing Feng
Yi Y. Liu
Songjun Cao
Long Ma
24
0
0
09 Mar 2022
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Uri Alon
Frank F. Xu
Junxian He
Sudipta Sengupta
Dan Roth
Graham Neubig
RALM
77
63
0
28 Jan 2022
How much do language models copy from their training data? Evaluating
  linguistic novelty in text generation using RAVEN
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN
R. Thomas McCoy
P. Smolensky
Tal Linzen
Jianfeng Gao
Asli Celikyilmaz
SyDa
25
119
0
18 Nov 2021
Optimization with Constraint Learning: A Framework and Survey
Optimization with Constraint Learning: A Framework and Survey
Adejuyigbe O. Fajemisin
Donato Maragno
D. Hertog
58
47
0
05 Oct 2021
Bag of Tricks for Optimizing Transformer Efficiency
Bag of Tricks for Optimizing Transformer Efficiency
Ye Lin
Yanyang Li
Tong Xiao
Jingbo Zhu
34
6
0
09 Sep 2021
Distantly Supervised Relation Extraction with Sentence Reconstruction
  and Knowledge Base Priors
Distantly Supervised Relation Extraction with Sentence Reconstruction and Knowledge Base Priors
Fenia Christopoulou
Makoto Miwa
Sophia Ananiadou
43
20
0
16 Apr 2021
A Survey on Large-scale Machine Learning
A Survey on Large-scale Machine Learning
Meng Wang
Weijie Fu
Xiangnan He
Shijie Hao
Xindong Wu
22
109
0
10 Aug 2020
Latent Video Transformer
Latent Video Transformer
Ruslan Rakhimov
Denis Volkhonskiy
Alexey Artemov
Denis Zorin
Evgeny Burnaev
VGen
33
119
0
18 Jun 2020
A Comprehensive Survey on Aspect Based Sentiment Analysis
A Comprehensive Survey on Aspect Based Sentiment Analysis
K. Yadav
13
1
0
08 Jun 2020
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network
  Language Model
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model
Da-Rong Liu
Chunxi Liu
Frank Zhang
Gabriel Synnaeve
Yatharth Saraf
Geoffrey Zweig
28
19
0
15 May 2020
Lite Transformer with Long-Short Range Attention
Lite Transformer with Long-Short Range Attention
Zhanghao Wu
Zhijian Liu
Ji Lin
Yujun Lin
Song Han
23
318
0
24 Apr 2020
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
Matej Martinc
Blaž Škrlj
Senja Pollak
24
37
0
20 Mar 2020
Pre-training Tasks for Embedding-based Large-scale Retrieval
Pre-training Tasks for Embedding-based Large-scale Retrieval
Wei-Cheng Chang
Felix X. Yu
Yin-Wen Chang
Yiming Yang
Sanjiv Kumar
RALM
13
301
0
10 Feb 2020
Deconstructing and reconstructing word embedding algorithms
Deconstructing and reconstructing word embedding algorithms
Edward Newell
Kian Kenyon-Dean
Jackie C.K. Cheung
39
4
0
29 Nov 2019
Generalization through Memorization: Nearest Neighbor Language Models
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
56
817
0
01 Nov 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
109
6,377
0
26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
43
584
0
25 Sep 2019
Towards Understanding Neural Machine Translation with Word Importance
Towards Understanding Neural Machine Translation with Word Importance
Shilin He
Zhaopeng Tu
Xing Wang
Longyue Wang
Michael R. Lyu
Shuming Shi
AAML
20
39
0
01 Sep 2019
Sampled Softmax with Random Fourier Features
Sampled Softmax with Random Fourier Features
A. S. Rawat
Jiecao Chen
Felix X. Yu
A. Suresh
Sanjiv Kumar
39
55
0
24 Jul 2019
Improving Neural Language Modeling via Adversarial Training
Improving Neural Language Modeling via Adversarial Training
Dilin Wang
Chengyue Gong
Qiang Liu
AAML
43
115
0
10 Jun 2019
On the computational complexity of the probabilistic label tree
  algorithms
On the computational complexity of the probabilistic label tree algorithms
R. Busa-Fekete
Krzysztof Dembczyñski
Alexander Golovnev
Kalina Jasinska
Mikhail Kuznetsov
M. Sviridenko
Chao Xu
TPM
29
3
0
01 Jun 2019
Using Ontologies To Improve Performance In Massively Multi-label
  Prediction Models
Using Ontologies To Improve Performance In Massively Multi-label Prediction Models
E. Steinberg
Peter J. Liu
NoLa
21
4
0
28 May 2019
Maybe Deep Neural Networks are the Best Choice for Modeling Source Code
Maybe Deep Neural Networks are the Best Choice for Modeling Source Code
Rafael-Michael Karampatsis
Charles Sutton
26
54
0
13 Mar 2019
Error-Correcting Neural Sequence Prediction
Error-Correcting Neural Sequence Prediction
James OÑeill
Danushka Bollegala
23
1
0
21 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
38
3,679
0
09 Jan 2019
A no-regret generalization of hierarchical softmax to extreme
  multi-label classification
A no-regret generalization of hierarchical softmax to extreme multi-label classification
Marek Wydmuch
Kalina Jasinska
Mikhail Kuznetsov
R. Busa-Fekete
Krzysztof Dembczyñski
24
100
0
27 Oct 2018
Real-time Neural-based Input Method
Real-time Neural-based Input Method
Jiali Yao
Raphael Shu
Xinjian Li
K. Ohtsuki
Hideki Nakayama
6
4
0
19 Oct 2018
Trellis Networks for Sequence Modeling
Trellis Networks for Sequence Modeling
Shaojie Bai
J. Zico Kolter
V. Koltun
25
145
0
15 Oct 2018
Adaptive Input Representations for Neural Language Modeling
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
26
387
0
28 Sep 2018
Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural
  Network Language Models in ASR
Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR
Yerbolat Khassanov
Chng Eng Siong
KELM
24
5
0
27 Jun 2018
Sigsoftmax: Reanalysis of the Softmax Bottleneck
Sigsoftmax: Reanalysis of the Softmax Bottleneck
Sekitoshi Kanai
Yasuhiro Fujiwara
Yuki Yamanaka
S. Adachi
13
68
0
28 May 2018
Interpretable Adversarial Perturbation in Input Embedding Space for Text
Interpretable Adversarial Perturbation in Input Embedding Space for Text
Motoki Sato
Jun Suzuki
Hiroyuki Shindo
Yuji Matsumoto
18
188
0
08 May 2018
Online normalizer calculation for softmax
Online normalizer calculation for softmax
Maxim Milakov
N. Gimelshein
16
84
0
08 May 2018
Efficient Contextualized Representation: Language Model Pruning for
  Sequence Labeling
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
Liyuan Liu
Xiang Ren
Jingbo Shang
Jian-wei Peng
Jiawei Han
25
44
0
20 Apr 2018
Fast Parametric Learning with Activation Memorization
Fast Parametric Learning with Activation Memorization
Jack W. Rae
Chris Dyer
Peter Dayan
Timothy Lillicrap
KELM
41
46
0
27 Mar 2018
An Analysis of Neural Language Modeling at Multiple Scales
An Analysis of Neural Language Modeling at Multiple Scales
Stephen Merity
N. Keskar
R. Socher
24
170
0
22 Mar 2018
Accelerated Training for Massive Classification via Dynamic Class
  Selection
Accelerated Training for Massive Classification via Dynamic Class Selection
Xingcheng Zhang
Lei Yang
Junjie Yan
Dahua Lin
30
41
0
05 Jan 2018
Self-organized Hierarchical Softmax
Self-organized Hierarchical Softmax
Songlin Yang
Shawn Tan
C. Pal
Aaron Courville
BDL
38
7
0
26 Jul 2017
12
Next