ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1512.04906
  4. Cited By
Strategies for Training Large Vocabulary Neural Language Models

Strategies for Training Large Vocabulary Neural Language Models

15 December 2015
Welin Chen
David Grangier
Michael Auli
    VLM
ArXivPDFHTML

Papers citing "Strategies for Training Large Vocabulary Neural Language Models"

28 / 28 papers shown
Title
Nonparametric Masked Language Modeling
Nonparametric Masked Language Modeling
Sewon Min
Weijia Shi
M. Lewis
Xilun Chen
Wen-tau Yih
Hannaneh Hajishirzi
Luke Zettlemoyer
RALM
50
48
0
02 Dec 2022
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in
  Practice
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice
Andreas Grivas
Nikolay Bogoychev
Adam Lopez
15
9
0
12 Mar 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
29
15
0
11 Feb 2022
Allocating Large Vocabulary Capacity for Cross-lingual Language Model
  Pre-training
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
Bo Zheng
Li Dong
Shaohan Huang
Saksham Singhal
Wanxiang Che
Ting Liu
Xia Song
Furu Wei
VLM
21
22
0
15 Sep 2021
Large-Scale Training System for 100-Million Classification at Alibaba
Large-Scale Training System for 100-Million Classification at Alibaba
Liuyihan Song
Pan Pan
Kang Zhao
Hao Yang
Yiming Chen
Yingya Zhang
Yinghui Xu
Rong Jin
40
23
0
09 Feb 2021
Pre-training Tasks for Embedding-based Large-scale Retrieval
Pre-training Tasks for Embedding-based Large-scale Retrieval
Wei-Cheng Chang
Felix X. Yu
Yin-Wen Chang
Yiming Yang
Sanjiv Kumar
RALM
13
301
0
10 Feb 2020
Shared-Private Bilingual Word Embeddings for Neural Machine Translation
Shared-Private Bilingual Word Embeddings for Neural Machine Translation
Xuebo Liu
Derek F. Wong
Yang Liu
Lidia S. Chao
Tong Xiao
Jingbo Zhu
35
37
0
07 Jun 2019
Error-Correcting Neural Sequence Prediction
Error-Correcting Neural Sequence Prediction
James OÑeill
Danushka Bollegala
23
1
0
21 Jan 2019
Von Mises-Fisher Loss for Training Sequence to Sequence Models with
  Continuous Outputs
Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs
Sachin Kumar
Yulia Tsvetkov
22
70
0
10 Dec 2018
Accelerating Large Scale Knowledge Distillation via Dynamic Importance
  Sampling
Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling
Minghan Li
Tanli Zuo
Ruicheng Li
Martha White
Weishi Zheng
29
3
0
03 Dec 2018
Real-time Neural-based Input Method
Real-time Neural-based Input Method
Jiali Yao
Raphael Shu
Xinjian Li
K. Ohtsuki
Hideki Nakayama
6
4
0
19 Oct 2018
Adaptive Input Representations for Neural Language Modeling
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
26
387
0
28 Sep 2018
Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural
  Network Language Models in ASR
Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR
Yerbolat Khassanov
Chng Eng Siong
KELM
24
5
0
27 Jun 2018
Navigating with Graph Representations for Fast and Scalable Decoding of
  Neural Language Models
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models
Minjia Zhang
Xiaodong Liu
Wenhan Wang
Jianfeng Gao
Yuxiong He
23
30
0
11 Jun 2018
Numeracy for Language Models: Evaluating and Improving their Ability to
  Predict Numbers
Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers
Georgios P. Spithourakis
Sebastian Riedel
30
81
0
21 May 2018
Online normalizer calculation for softmax
Online normalizer calculation for softmax
Maxim Milakov
N. Gimelshein
14
84
0
08 May 2018
Fast Parametric Learning with Activation Memorization
Fast Parametric Learning with Activation Memorization
Jack W. Rae
Chris Dyer
Peter Dayan
Timothy Lillicrap
KELM
41
46
0
27 Mar 2018
Accelerated Training for Massive Classification via Dynamic Class
  Selection
Accelerated Training for Massive Classification via Dynamic Class Selection
Xingcheng Zhang
Lei Yang
Junjie Yan
Dahua Lin
30
41
0
05 Jan 2018
Self-organized Hierarchical Softmax
Self-organized Hierarchical Softmax
Songlin Yang
Shawn Tan
C. Pal
Aaron Courville
BDL
38
7
0
26 Jul 2017
Syllable-aware Neural Language Models: A Failure to Beat Character-aware
  Ones
Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones
Z. Assylbekov
Rustem Takhanov
Bagdat Myrzakhmetov
Jonathan North Washington
32
17
0
20 Jul 2017
Fast Single-Class Classification and the Principle of Logit Separation
Fast Single-Class Classification and the Principle of Logit Separation
Gil Keren
Sivan Sabato
Björn Schuller
14
6
0
29 May 2017
Language Modeling with Gated Convolutional Networks
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
50
2,360
0
23 Dec 2016
Getting Started with Neural Models for Semantic Matching in Web Search
Getting Started with Neural Models for Semantic Matching in Web Search
Kezban Dilek Onal
I. S. Altingövde
Pinar Senkul
Maarten de Rijke
VLM
3DV
31
9
0
08 Nov 2016
Vocabulary Selection Strategies for Neural Machine Translation
Vocabulary Selection Strategies for Neural Machine Translation
Gurvan L'Hostis
David Grangier
Michael Auli
22
41
0
01 Oct 2016
Efficient softmax approximation for GPUs
Efficient softmax approximation for GPUs
Edouard Grave
Armand Joulin
Moustapha Cissé
David Grangier
Hervé Jégou
25
270
0
14 Sep 2016
Generalizing and Hybridizing Count-based and Neural Language Models
Generalizing and Hybridizing Count-based and Neural Language Models
Graham Neubig
Chris Dyer
64
31
0
01 Jun 2016
The Z-loss: a shift and scale invariant classification loss belonging to
  the Spherical Family
The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family
A. D. Brébisson
Pascal Vincent
17
10
0
29 Apr 2016
Recurrent Memory Networks for Language Modeling
Recurrent Memory Networks for Language Modeling
Ke M. Tran
Arianna Bisazza
Christof Monz
30
21
0
06 Jan 2016
1