Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1708.02182
Cited By
Regularizing and Optimizing LSTM Language Models
7 August 2017
Stephen Merity
N. Keskar
R. Socher
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Regularizing and Optimizing LSTM Language Models"
50 / 508 papers shown
Title
Look Backward and Forward: Self-Knowledge Distillation with Bidirectional Decoder for Neural Machine Translation
Xuan Zhang
Libin Shen
Disheng Pan
Liangguo Wang
Yanjun Miao
19
1
0
10 Mar 2022
Parallel Spatio-Temporal Attention-Based TCN for Multivariate Time Series Prediction
Fan Jin
Ke Zhang
Huang Yipan
Yifei Zhu
Baiping Chen
AI4TS
11
173
0
02 Mar 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Youjie Li
Amar Phanishayee
D. Murray
Jakub Tarnawski
N. Kim
6
19
0
02 Feb 2022
LiteLSTM Architecture for Deep Recurrent Neural Networks
Nelly Elsayed
Zag ElSayed
Anthony Maida
32
5
0
27 Jan 2022
Low-Rank Constraints for Fast Inference in Structured Models
Justin T. Chiu
Yuntian Deng
Alexander M. Rush
BDL
29
13
0
08 Jan 2022
The Importance of the Current Input in Sequence Modeling
Christian Oliva
Luis F. Lago-Fernández
3DV
14
1
0
22 Dec 2021
Predicting Media Memorability: Comparing Visual, Textual and Auditory Features
Lorin Sweeney
Graham Healy
A. Smeaton
8
5
0
15 Dec 2021
FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding
B. Pung
Alvin Chan
11
0
0
28 Nov 2021
Altering Backward Pass Gradients improves Convergence
Bishshoy Das
M. Mondal
Brejesh Lall
S. Joshi
Sumantra Dutta Roy
14
0
0
24 Nov 2021
IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages
Karthik Puranik
B. Bharathi
Senthil Kumar B
27
7
0
15 Nov 2021
Gradients are Not All You Need
Luke Metz
C. Freeman
S. Schoenholz
Tal Kachman
28
93
0
10 Nov 2021
Preventing posterior collapse in variational autoencoders for text generation via decoder regularization
Alban Petit
Caio Corro
DRL
13
3
0
28 Oct 2021
Paradigm Shift in Language Modeling: Revisiting CNN for Modeling Sanskrit Originated Bengali and Hindi Language
C. R. Rahman
Md. Hasibur Rahman
Mohammad Rafsan
S. Zakir
Mohammed Eunus Ali
Rafsanjani Muhammod
13
1
0
25 Oct 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Fei Wu
Jiwei Li
LRM
16
37
0
17 Oct 2021
Back from the future: bidirectional CTC decoding using future information in speech recognition
Namkyu Jung
Geon-min Kim
Han-Gyu Kim
31
3
0
07 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
34
22
0
07 Oct 2021
Capturing Structural Locality in Non-parametric Language Models
Frank F. Xu
Junxian He
Graham Neubig
Vincent J. Hellendoorn
19
14
0
06 Oct 2021
Spell my name: keyword boosted speech recognition
Namkyu Jung
Geon-min Kim
Joon Son Chung
43
13
0
06 Oct 2021
Autoregressive Diffusion Models
Emiel Hoogeboom
Alexey A. Gritsenko
Jasmijn Bastings
Ben Poole
Rianne van den Berg
Tim Salimans
DiffM
37
142
0
05 Oct 2021
Regularized Training of Nearest Neighbor Language Models
Jean-François Ton
Walter A. Talbott
Shuangfei Zhai
J. Susskind
RALM
17
3
0
16 Sep 2021
Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph
Nuttapong Chairatanakul
Noppayut Sriwatanasakdi
Nontawat Charoenphakdee
Xin Liu
T. Murata
16
4
0
09 Sep 2021
Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings
Sangwon Yu
Jongyoon Song
Heeseung Kim
SeongEun Lee
Woo-Jong Ryu
Sung-Hoon Yoon
9
31
0
07 Sep 2021
Learning Hierarchical Structures with Differentiable Nondeterministic Stacks
Brian DuSell
David Chiang
BDL
18
14
0
05 Sep 2021
LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training
Benjamin Clavié
Akshita Gheewala
Paul Briton
Marc Alphonsus
Rym Labiyaad
Francesco Piccoli
VLM
AILaw
24
2
0
02 Sep 2021
Representation Memorization for Fast Learning New Knowledge without Forgetting
Fei Mi
Tao R. Lin
Boi Faltings
CLL
11
0
0
28 Aug 2021
Latent Space Energy-Based Model of Symbol-Vector Coupling for Text Generation and Classification
Bo Pang
Ying Nian Wu
19
18
0
26 Aug 2021
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLT
AI4CE
44
37
0
25 Aug 2021
GAN Computers Generate Arts? A Survey on Visual Arts, Music, and Literary Text Generation using Generative Adversarial Network
Sakib Shahriar
GAN
39
101
0
09 Aug 2021
Towards Zero-shot Language Modeling
E. Ponti
Ivan Vulić
Ryan Cotterell
Roi Reichart
Anna Korhonen
22
19
0
06 Aug 2021
Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition
Xianrui Zheng
Chao Zhang
P. Woodland
6
46
0
29 Jul 2021
Exploring the Potential of Lexical Paraphrases for Mitigating Noise-Induced Comprehension Errors
Anupama Chingacham
Vera Demberg
Dietrich Klakow
14
4
0
18 Jul 2021
From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text
Ishan Tarunesh
Syamantak Kumar
P. Jyothi
36
45
0
14 Jul 2021
R-Drop: Regularized Dropout for Neural Networks
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
M. Zhang
Tie-Yan Liu
41
424
0
28 Jun 2021
Stabilizing Equilibrium Models by Jacobian Regularization
Shaojie Bai
V. Koltun
J. Zico Kolter
22
57
0
28 Jun 2021
Exploring Self-Identified Counseling Expertise in Online Support Forums
Allison Lahnala
Yuntian Zhao
Charles F Welch
Jonathan K. Kummerfeld
Lawrence C. An
Kenneth Resnicow
Rada Mihalcea
Verónica Pérez-Rosas
14
22
0
24 Jun 2021
Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training
Anup Sarma
Sonali Singh
Huaipan Jiang
Rui Zhang
M. Kandemir
Chita R. Das
11
1
0
22 Jun 2021
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang
Xingyao Zhang
S. Song
Sara Hooker
25
75
0
22 Jun 2021
Recurrent Neural Network from Adder's Perspective: Carry-lookahead RNN
Haowei Jiang
Fei-wei Qin
Jin Cao
Yong Peng
Yanli Shao
LRM
ODL
16
42
0
22 Jun 2021
On the long-term learning ability of LSTM LMs
Wim Boes
Robbe Van Rompaey
Lyan Verwimp
J. Pelemans
Hugo Van hamme
P. Wambacq
13
1
0
16 Jun 2021
Modeling the Unigram Distribution
Irene Nikkarinen
Tiago Pimentel
Damián E. Blasi
Ryan Cotterell
21
8
0
04 Jun 2021
Language Model Evaluation Beyond Perplexity
Clara Meister
Ryan Cotterell
20
73
0
31 May 2021
Effective Batching for Recurrent Neural Network Grammars
Hiroshi Noji
Yohei Oseki
GNN
11
16
0
31 May 2021
Predictive Representation Learning for Language Modeling
Qingfeng Lan
Luke N. Kumar
Martha White
Alona Fyshe
OffRL
AI4TS
16
1
0
29 May 2021
Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
Dong-Young Lim
Sotirios Sabanis
21
11
0
28 May 2021
Automatic Construction of Sememe Knowledge Bases via Dictionaries
Fanchao Qi
Yangyi Chen
Fengyu Wang
Zhiyuan Liu
Xiao Chen
Maosong Sun
14
6
0
26 May 2021
Non-Autoregressive vs Autoregressive Neural Networks for System Identification
Daniel Weber
C. Gühmann
19
7
0
05 May 2021
Impact of Gender Debiased Word Embeddings in Language Modeling
Christine Basta
Marta R. Costa-jussá
21
4
0
03 May 2021
The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System
Lorin Sweeney
Graham Healy
A. Smeaton
10
11
0
23 Apr 2021
IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always Hope in Transformers
Karthik Puranik
Adeep Hande
R. Priyadharshini
Sajeetha Thavareesan
Bharathi Raja Chakravarthi
15
59
0
19 Apr 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu (Allen) Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
33
36
0
16 Apr 2021
Previous
1
2
3
4
5
6
...
9
10
11
Next