ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04309
  4. Cited By
Efficient softmax approximation for GPUs

Efficient softmax approximation for GPUs

14 September 2016
Edouard Grave
Armand Joulin
Moustapha Cissé
David Grangier
Hervé Jégou
ArXivPDFHTML

Papers citing "Efficient softmax approximation for GPUs"

50 / 151 papers shown
Title
Compact Recurrent Transformer with Persistent Memory
Compact Recurrent Transformer with Persistent Memory
Edison Mucllari
Z. Daniels
David C. Zhang
Qiang Ye
CLL
VLM
54
0
0
02 May 2025
Dynamic Embedded Topic Models: properties and recommendations based on diverse corpora
Dynamic Embedded Topic Models: properties and recommendations based on diverse corpora
Elisabeth Fittschen
Bella Xia
Leib Celnik
Paul Dilley
Tom Lippincott
49
0
0
27 Apr 2025
Scalable Trajectory-User Linking with Dual-Stream Representation Networks
Scalable Trajectory-User Linking with Dual-Stream Representation Networks
Hao Zhang
Wei Chen
Xingyu Zhao
Jianpeng Qi
Guiyuan Jiang
Yanwei Yu
42
1
0
19 Mar 2025
Scaling Embedding Layers in Language Models
Scaling Embedding Layers in Language Models
Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang
84
0
0
03 Feb 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
67
0
0
02 Feb 2025
A Walsh Hadamard Derived Linear Vector Symbolic Architecture
A Walsh Hadamard Derived Linear Vector Symbolic Architecture
Mohammad Mahmudul Alam
Alexander Oberle
Edward Raff
Stella Biderman
Tim Oates
James Holt
LLMSV
31
0
0
30 Oct 2024
Large Vocabulary Size Improves Large Language Models
Large Vocabulary Size Improves Large Language Models
Sho Takase
Ryokan Ri
Shun Kiyono
Takuya Kato
45
3
0
24 Jun 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
Korbinian Riedhammer
Tobias Bocklet
48
3
0
16 Jun 2024
Scaling the Vocabulary of Non-autoregressive Models for Efficient
  Generative Retrieval
Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval
Ravisri Valluri
Akash Kumar Mohankumar
Kushal Dave
Amit Singh
Jian Jiao
Manik Varma
Gaurav Sinha
61
1
0
10 Jun 2024
From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of
  Deep Neural Networks
From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks
Xue Geng
Zhe Wang
Chunyun Chen
Qing Xu
Kaixin Xu
...
Zhenghua Chen
M. Aly
Jie Lin
Min-man Wu
Xiaoli Li
33
1
0
09 May 2024
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
Marco Cognetta
Tatsuya Hiraoka
Naoaki Okazaki
Rico Sennrich
Yuval Pinter
34
2
0
30 Mar 2024
How to Understand Named Entities: Using Common Sense for News Captioning
How to Understand Named Entities: Using Common Sense for News Captioning
Ning Xu
Yanhui Wang
Tingting Zhang
Hongshuo Tian
Mohan Kankanhalli
An-An Liu
40
0
0
11 Mar 2024
Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for
  Large Language Models
Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models
Michele Mastromattei
Fabio Massimo Zanzotto
VLM
37
1
0
05 Feb 2024
Memory-efficient Stochastic methods for Memory-based Transformers
Memory-efficient Stochastic methods for Memory-based Transformers
Vishwajit Kumar Vishnu
C. Sekhar
6
0
0
14 Nov 2023
ViR: Towards Efficient Vision Retention Backbones
ViR: Towards Efficient Vision Retention Backbones
Ali Hatamizadeh
Michael Ranzinger
Shiyi Lan
Jose M. Alvarez
Sanja Fidler
Jan Kautz
GNN
22
1
0
30 Oct 2023
MobileNMT: Enabling Translation in 15MB and 30ms
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
38
1
0
07 Jun 2023
The Information Pathways Hypothesis: Transformers are Dynamic
  Self-Ensembles
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
39
3
0
02 Jun 2023
Extending Memory for Language Modelling
Extending Memory for Language Modelling
A. Nugaliyadde
KELM
CLL
VLM
19
0
0
19 May 2023
Massively Scaling Heteroscedastic Classifiers
Massively Scaling Heteroscedastic Classifiers
Mark Collier
Rodolphe Jenatton
Basil Mustafa
N. Houlsby
Jesse Berent
E. Kokiopoulou
26
8
0
30 Jan 2023
XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked
  Language Models
XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
Davis Liang
Hila Gonen
Yuning Mao
Rui Hou
Naman Goyal
Marjan Ghazvininejad
Luke Zettlemoyer
Madian Khabsa
20
72
0
25 Jan 2023
Why do Nearest Neighbor Language Models Work?
Why do Nearest Neighbor Language Models Work?
Frank F. Xu
Uri Alon
Graham Neubig
RALM
30
22
0
07 Jan 2023
Trajectory-User Linking Is Easier Than You Think
Trajectory-User Linking Is Easier Than You Think
Alameen Najjar
K. Mede
29
3
0
14 Dec 2022
Meta-Learning Fast Weight Language Models
Meta-Learning Fast Weight Language Models
Kevin Clark
Kelvin Guu
Ming-Wei Chang
Panupong Pasupat
Geoffrey E. Hinton
Mohammad Norouzi
KELM
32
13
0
05 Dec 2022
Nonparametric Masked Language Modeling
Nonparametric Masked Language Modeling
Sewon Min
Weijia Shi
M. Lewis
Xilun Chen
Wen-tau Yih
Hannaneh Hajishirzi
Luke Zettlemoyer
RALM
50
48
0
02 Dec 2022
SPOT: Knowledge-Enhanced Language Representations for Information
  Extraction
SPOT: Knowledge-Enhanced Language Representations for Information Extraction
Jiacheng Li
Yannis Katsis
Tyler Baldwin
Ho-Cheol Kim
Andrew Bartko
Julian McAuley
Chun-Nan Hsu
30
15
0
20 Aug 2022
Fast Vocabulary Projection Method via Clustering for Multilingual
  Machine Translation on GPU
Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU
Hossam Amer
Young Jin Kim
Mohamed Afify
Hitokazu Matsushita
Hany Awadalla
25
1
0
14 Aug 2022
Stable Invariant Models via Koopman Spectra
Stable Invariant Models via Koopman Spectra
Takuya Konishi
Yoshinobu Kawahara
23
3
0
15 Jul 2022
Online Trajectory Prediction for Metropolitan Scale Mobility Digital
  Twin
Online Trajectory Prediction for Metropolitan Scale Mobility Digital Twin
Z. Fan
Xiaojie Yang
Wei Yuan
Renhe Jiang
Quanjun Chen
Xuan Song
Ryosuke Shibasaki
17
18
0
21 Jun 2022
Implicit N-grams Induced by Recurrence
Implicit N-grams Induced by Recurrence
Xiaobing Sun
Wei Lu
27
3
0
05 May 2022
Better Language Model with Hypernym Class Prediction
Better Language Model with Hypernym Class Prediction
Richard He Bai
Tong Wang
Alessandro Sordoni
Peng Shi
84
15
0
21 Mar 2022
A practical framework for multi-domain speech recognition and an
  instance sampling method to neural language modeling
A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling
Yike Zhang
Xiaobing Feng
Yi Y. Liu
Songjun Cao
Long Ma
24
0
0
09 Mar 2022
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Uri Alon
Frank F. Xu
Junxian He
Sudipta Sengupta
Dan Roth
Graham Neubig
RALM
77
63
0
28 Jan 2022
Relational Memory Augmented Language Models
Relational Memory Augmented Language Models
Qi Liu
Dani Yogatama
Phil Blunsom
KELM
RALM
69
32
0
24 Jan 2022
How much do language models copy from their training data? Evaluating
  linguistic novelty in text generation using RAVEN
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN
R. Thomas McCoy
P. Smolensky
Tal Linzen
Jianfeng Gao
Asli Celikyilmaz
SyDa
25
119
0
18 Nov 2021
Optimization with Constraint Learning: A Framework and Survey
Optimization with Constraint Learning: A Framework and Survey
Adejuyigbe O. Fajemisin
Donato Maragno
D. Hertog
58
47
0
05 Oct 2021
Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label
  Text Classification
Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
Jiong Zhang
Wei-Cheng Chang
Hsiang-Fu Yu
Inderjit S. Dhillon
17
99
0
01 Oct 2021
Regularized Training of Nearest Neighbor Language Models
Regularized Training of Nearest Neighbor Language Models
Jean-François Ton
Walter A. Talbott
Shuangfei Zhai
J. Susskind
RALM
19
3
0
16 Sep 2021
Bag of Tricks for Optimizing Transformer Efficiency
Bag of Tricks for Optimizing Transformer Efficiency
Ye Lin
Yanyang Li
Tong Xiao
Jingbo Zhu
34
6
0
09 Sep 2021
Autoencoders as Tools for Program Synthesis
Autoencoders as Tools for Program Synthesis
Sander de Bruin
Vadim Liventsev
M. Petković
10
4
0
16 Aug 2021
Improving Speech Recognition Accuracy of Local POI Using Geographical
  Models
Improving Speech Recognition Accuracy of Local POI Using Geographical Models
Songjun Cao
Yike Zhang
Xiaobing Feng
Long Ma
13
3
0
07 Jul 2021
Hash Layers For Large Sparse Models
Hash Layers For Large Sparse Models
Stephen Roller
Sainbayar Sukhbaatar
Arthur Szlam
Jason Weston
MoE
22
209
0
08 Jun 2021
Not All Memories are Created Equal: Learning to Forget by Expiring
Not All Memories are Created Equal: Learning to Forget by Expiring
Sainbayar Sukhbaatar
Da Ju
Spencer Poff
Stephen Roller
Arthur Szlam
Jason Weston
Angela Fan
CLL
21
34
0
13 May 2021
Prototype Memory for Large-scale Face Representation Learning
Prototype Memory for Large-scale Face Representation Learning
Evgeny Smirnov
Nikita Garaev
V. Galyuk
Evgeny Lukyanets
CVBM
21
3
0
05 May 2021
Distantly Supervised Relation Extraction with Sentence Reconstruction
  and Knowledge Base Priors
Distantly Supervised Relation Extraction with Sentence Reconstruction and Knowledge Base Priors
Fenia Christopoulou
Makoto Miwa
Sophia Ananiadou
43
20
0
16 Apr 2021
Revisiting Simple Neural Probabilistic Language Models
Revisiting Simple Neural Probabilistic Language Models
Simeng Sun
Mohit Iyyer
24
14
0
08 Apr 2021
When Attention Meets Fast Recurrence: Training Language Models with
  Reduced Compute
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Tao Lei
RALM
VLM
59
47
0
24 Feb 2021
Do Transformer Modifications Transfer Across Implementations and
  Applications?
Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
...
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
33
126
0
23 Feb 2021
Adaptive Semiparametric Language Models
Adaptive Semiparametric Language Models
Dani Yogatama
Cyprien de Masson dÁutume
Lingpeng Kong
KELM
RALM
43
98
0
04 Feb 2021
Neural Methods for Effective, Efficient, and Exposure-Aware Information
  Retrieval
Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval
Bhaskar Mitra
20
5
0
21 Dec 2020
Grounded Compositional Outputs for Adaptive Language Modeling
Grounded Compositional Outputs for Adaptive Language Modeling
Nikolaos Pappas
Phoebe Mulcaire
Noah A. Smith
KELM
25
7
0
24 Sep 2020
1234
Next