ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.02182
  4. Cited By
Regularizing and Optimizing LSTM Language Models

Regularizing and Optimizing LSTM Language Models

7 August 2017
Stephen Merity
N. Keskar
R. Socher
ArXivPDFHTML

Papers citing "Regularizing and Optimizing LSTM Language Models"

50 / 508 papers shown
Title
Look Backward and Forward: Self-Knowledge Distillation with
  Bidirectional Decoder for Neural Machine Translation
Look Backward and Forward: Self-Knowledge Distillation with Bidirectional Decoder for Neural Machine Translation
Xuan Zhang
Libin Shen
Disheng Pan
Liangguo Wang
Yanjun Miao
19
1
0
10 Mar 2022
Parallel Spatio-Temporal Attention-Based TCN for Multivariate Time
  Series Prediction
Parallel Spatio-Temporal Attention-Based TCN for Multivariate Time Series Prediction
Fan Jin
Ke Zhang
Huang Yipan
Yifei Zhu
Baiping Chen
AI4TS
11
173
0
02 Mar 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive
  DNN Models on Commodity Servers
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Youjie Li
Amar Phanishayee
D. Murray
Jakub Tarnawski
N. Kim
6
19
0
02 Feb 2022
LiteLSTM Architecture for Deep Recurrent Neural Networks
LiteLSTM Architecture for Deep Recurrent Neural Networks
Nelly Elsayed
Zag ElSayed
Anthony Maida
32
5
0
27 Jan 2022
Low-Rank Constraints for Fast Inference in Structured Models
Low-Rank Constraints for Fast Inference in Structured Models
Justin T. Chiu
Yuntian Deng
Alexander M. Rush
BDL
29
13
0
08 Jan 2022
The Importance of the Current Input in Sequence Modeling
The Importance of the Current Input in Sequence Modeling
Christian Oliva
Luis F. Lago-Fernández
3DV
14
1
0
22 Dec 2021
Predicting Media Memorability: Comparing Visual, Textual and Auditory
  Features
Predicting Media Memorability: Comparing Visual, Textual and Auditory Features
Lorin Sweeney
Graham Healy
A. Smeaton
8
5
0
15 Dec 2021
FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding
FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding
B. Pung
Alvin Chan
11
0
0
28 Nov 2021
Altering Backward Pass Gradients improves Convergence
Altering Backward Pass Gradients improves Convergence
Bishshoy Das
M. Mondal
Brejesh Lall
S. Joshi
Sumantra Dutta Roy
14
0
0
24 Nov 2021
IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment
  analysis of code-mixed text in Dravidian languages
IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages
Karthik Puranik
B. Bharathi
Senthil Kumar B
27
7
0
15 Nov 2021
Gradients are Not All You Need
Gradients are Not All You Need
Luke Metz
C. Freeman
S. Schoenholz
Tal Kachman
28
93
0
10 Nov 2021
Preventing posterior collapse in variational autoencoders for text
  generation via decoder regularization
Preventing posterior collapse in variational autoencoders for text generation via decoder regularization
Alban Petit
Caio Corro
DRL
13
3
0
28 Oct 2021
Paradigm Shift in Language Modeling: Revisiting CNN for Modeling
  Sanskrit Originated Bengali and Hindi Language
Paradigm Shift in Language Modeling: Revisiting CNN for Modeling Sanskrit Originated Bengali and Hindi Language
C. R. Rahman
Md. Hasibur Rahman
Mohammad Rafsan
S. Zakir
Mohammed Eunus Ali
Rafsanjani Muhammod
13
1
0
25 Oct 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Fei Wu
Jiwei Li
LRM
16
37
0
17 Oct 2021
Back from the future: bidirectional CTC decoding using future
  information in speech recognition
Back from the future: bidirectional CTC decoding using future information in speech recognition
Namkyu Jung
Geon-min Kim
Han-Gyu Kim
31
3
0
07 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic
  Bounds and Implications
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
34
22
0
07 Oct 2021
Capturing Structural Locality in Non-parametric Language Models
Capturing Structural Locality in Non-parametric Language Models
Frank F. Xu
Junxian He
Graham Neubig
Vincent J. Hellendoorn
19
14
0
06 Oct 2021
Spell my name: keyword boosted speech recognition
Spell my name: keyword boosted speech recognition
Namkyu Jung
Geon-min Kim
Joon Son Chung
43
13
0
06 Oct 2021
Autoregressive Diffusion Models
Autoregressive Diffusion Models
Emiel Hoogeboom
Alexey A. Gritsenko
Jasmijn Bastings
Ben Poole
Rianne van den Berg
Tim Salimans
DiffM
37
142
0
05 Oct 2021
Regularized Training of Nearest Neighbor Language Models
Regularized Training of Nearest Neighbor Language Models
Jean-François Ton
Walter A. Talbott
Shuangfei Zhai
J. Susskind
RALM
17
3
0
16 Sep 2021
Cross-lingual Transfer for Text Classification with Dictionary-based
  Heterogeneous Graph
Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph
Nuttapong Chairatanakul
Noppayut Sriwatanasakdi
Nontawat Charoenphakdee
Xin Liu
T. Murata
16
4
0
09 Sep 2021
Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via
  Adaptive Gradient Gating for Rare Token Embeddings
Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings
Sangwon Yu
Jongyoon Song
Heeseung Kim
SeongEun Lee
Woo-Jong Ryu
Sung-Hoon Yoon
9
31
0
07 Sep 2021
Learning Hierarchical Structures with Differentiable Nondeterministic
  Stacks
Learning Hierarchical Structures with Differentiable Nondeterministic Stacks
Brian DuSell
David Chiang
BDL
18
14
0
05 Sep 2021
LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language
  Model Pre-Training
LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training
Benjamin Clavié
Akshita Gheewala
Paul Briton
Marc Alphonsus
Rym Labiyaad
Francesco Piccoli
VLM
AILaw
24
2
0
02 Sep 2021
Representation Memorization for Fast Learning New Knowledge without
  Forgetting
Representation Memorization for Fast Learning New Knowledge without Forgetting
Fei Mi
Tao R. Lin
Boi Faltings
CLL
11
0
0
28 Aug 2021
Latent Space Energy-Based Model of Symbol-Vector Coupling for Text
  Generation and Classification
Latent Space Energy-Based Model of Symbol-Vector Coupling for Text Generation and Classification
Bo Pang
Ying Nian Wu
19
18
0
26 Aug 2021
Understanding the Generalization of Adam in Learning Neural Networks
  with Proper Regularization
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLT
AI4CE
44
37
0
25 Aug 2021
GAN Computers Generate Arts? A Survey on Visual Arts, Music, and
  Literary Text Generation using Generative Adversarial Network
GAN Computers Generate Arts? A Survey on Visual Arts, Music, and Literary Text Generation using Generative Adversarial Network
Sakib Shahriar
GAN
39
101
0
09 Aug 2021
Towards Zero-shot Language Modeling
Towards Zero-shot Language Modeling
E. Ponti
Ivan Vulić
Ryan Cotterell
Roi Reichart
Anna Korhonen
22
19
0
06 Aug 2021
Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition
Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition
Xianrui Zheng
Chao Zhang
P. Woodland
6
46
0
29 Jul 2021
Exploring the Potential of Lexical Paraphrases for Mitigating
  Noise-Induced Comprehension Errors
Exploring the Potential of Lexical Paraphrases for Mitigating Noise-Induced Comprehension Errors
Anupama Chingacham
Vera Demberg
Dietrich Klakow
14
4
0
18 Jul 2021
From Machine Translation to Code-Switching: Generating High-Quality
  Code-Switched Text
From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text
Ishan Tarunesh
Syamantak Kumar
P. Jyothi
36
45
0
14 Jul 2021
R-Drop: Regularized Dropout for Neural Networks
R-Drop: Regularized Dropout for Neural Networks
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
M. Zhang
Tie-Yan Liu
41
424
0
28 Jun 2021
Stabilizing Equilibrium Models by Jacobian Regularization
Stabilizing Equilibrium Models by Jacobian Regularization
Shaojie Bai
V. Koltun
J. Zico Kolter
22
57
0
28 Jun 2021
Exploring Self-Identified Counseling Expertise in Online Support Forums
Exploring Self-Identified Counseling Expertise in Online Support Forums
Allison Lahnala
Yuntian Zhao
Charles F Welch
Jonathan K. Kummerfeld
Lawrence C. An
Kenneth Resnicow
Rada Mihalcea
Verónica Pérez-Rosas
14
22
0
24 Jun 2021
Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for
  Efficient Training
Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training
Anup Sarma
Sonali Singh
Huaipan Jiang
Rui Zhang
M. Kandemir
Chita R. Das
11
1
0
22 Jun 2021
Randomness In Neural Network Training: Characterizing The Impact of
  Tooling
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang
Xingyao Zhang
S. Song
Sara Hooker
25
75
0
22 Jun 2021
Recurrent Neural Network from Adder's Perspective: Carry-lookahead RNN
Recurrent Neural Network from Adder's Perspective: Carry-lookahead RNN
Haowei Jiang
Fei-wei Qin
Jin Cao
Yong Peng
Yanli Shao
LRM
ODL
16
42
0
22 Jun 2021
On the long-term learning ability of LSTM LMs
On the long-term learning ability of LSTM LMs
Wim Boes
Robbe Van Rompaey
Lyan Verwimp
J. Pelemans
Hugo Van hamme
P. Wambacq
13
1
0
16 Jun 2021
Modeling the Unigram Distribution
Modeling the Unigram Distribution
Irene Nikkarinen
Tiago Pimentel
Damián E. Blasi
Ryan Cotterell
21
8
0
04 Jun 2021
Language Model Evaluation Beyond Perplexity
Language Model Evaluation Beyond Perplexity
Clara Meister
Ryan Cotterell
20
73
0
31 May 2021
Effective Batching for Recurrent Neural Network Grammars
Effective Batching for Recurrent Neural Network Grammars
Hiroshi Noji
Yohei Oseki
GNN
11
16
0
31 May 2021
Predictive Representation Learning for Language Modeling
Predictive Representation Learning for Language Modeling
Qingfeng Lan
Luke N. Kumar
Martha White
Alona Fyshe
OffRL
AI4TS
16
1
0
29 May 2021
Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient
  adaptive algorithms for neural networks
Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
Dong-Young Lim
Sotirios Sabanis
21
11
0
28 May 2021
Automatic Construction of Sememe Knowledge Bases via Dictionaries
Automatic Construction of Sememe Knowledge Bases via Dictionaries
Fanchao Qi
Yangyi Chen
Fengyu Wang
Zhiyuan Liu
Xiao Chen
Maosong Sun
14
6
0
26 May 2021
Non-Autoregressive vs Autoregressive Neural Networks for System
  Identification
Non-Autoregressive vs Autoregressive Neural Networks for System Identification
Daniel Weber
C. Gühmann
19
7
0
05 May 2021
Impact of Gender Debiased Word Embeddings in Language Modeling
Impact of Gender Debiased Word Embeddings in Language Modeling
Christine Basta
Marta R. Costa-jussá
21
4
0
03 May 2021
The Influence of Audio on Video Memorability with an Audio Gestalt
  Regulated Video Memorability System
The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System
Lorin Sweeney
Graham Healy
A. Smeaton
10
11
0
23 Apr 2021
IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always Hope in
  Transformers
IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always Hope in Transformers
Karthik Puranik
Adeep Hande
R. Priyadharshini
Sajeetha Thavareesan
Bharathi Raja Chakravarthi
15
59
0
19 Apr 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu (Allen) Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
33
36
0
16 Apr 2021
Previous
123456...91011
Next