ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.02182
  4. Cited By
Regularizing and Optimizing LSTM Language Models

Regularizing and Optimizing LSTM Language Models

7 August 2017
Stephen Merity
N. Keskar
R. Socher
ArXivPDFHTML

Papers citing "Regularizing and Optimizing LSTM Language Models"

50 / 508 papers shown
Title
Improved Language Modeling by Decoding the Past
Improved Language Modeling by Decoding the Past
Siddhartha Brahma
BDL
AI4TS
4
6
0
14 Aug 2018
REGMAPR - Text Matching Made Easy
REGMAPR - Text Matching Made Easy
Siddhartha Brahma
VLM
14
1
0
13 Aug 2018
Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF
  networks for named entity recognition
Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF networks for named entity recognition
Antonio Jimeno Yepes
16
2
0
13 Aug 2018
Character-Level Language Modeling with Deeper Self-Attention
Character-Level Language Modeling with Deeper Self-Attention
Rami Al-Rfou
Dokook Choe
Noah Constant
Mandy Guo
Llion Jones
20
386
0
09 Aug 2018
On Training Recurrent Networks with Truncated Backpropagation Through
  Time in Speech Recognition
On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition
Hao Tang
James R. Glass
8
19
0
09 Jul 2018
DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search
Hanxiao Liu
Karen Simonyan
Yiming Yang
6
4,297
0
24 Jun 2018
Insights on representational similarity in neural networks with
  canonical correlation
Insights on representational similarity in neural networks with canonical correlation
Ari S. Morcos
M. Raghu
Samy Bengio
DRL
18
429
0
14 Jun 2018
Navigating with Graph Representations for Fast and Scalable Decoding of
  Neural Language Models
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models
Minjia Zhang
Xiaodong Liu
Wenhan Wang
Jianfeng Gao
Yuxiong He
23
30
0
11 Jun 2018
Straight to the Tree: Constituency Parsing with Neural Syntactic
  Distance
Straight to the Tree: Constituency Parsing with Neural Syntactic Distance
Yikang Shen
Zhouhan Lin
Athul Paul Jacob
Alessandro Sordoni
Aaron Courville
Yoshua Bengio
17
91
0
11 Jun 2018
Towards Binary-Valued Gates for Robust LSTM Training
Towards Binary-Valued Gates for Robust LSTM Training
Zhuohan Li
Di He
Fei Tian
Wei-neng Chen
Tao Qin
Liwei Wang
Tie-Yan Liu
MQ
10
47
0
08 Jun 2018
Efficient Full-Matrix Adaptive Regularization
Efficient Full-Matrix Adaptive Regularization
Naman Agarwal
Brian Bullins
Xinyi Chen
Elad Hazan
Karan Singh
Cyril Zhang
Yi Zhang
8
21
0
08 Jun 2018
GamePad: A Learning Environment for Theorem Proving
GamePad: A Learning Environment for Theorem Proving
Daniel Huang
Prafulla Dhariwal
D. Song
Ilya Sutskever
18
109
0
02 Jun 2018
Incremental Natural Language Processing: Challenges, Strategies, and
  Evaluation
Incremental Natural Language Processing: Challenges, Strategies, and Evaluation
Arne Köhn
CLL
14
11
0
31 May 2018
Sigsoftmax: Reanalysis of the Softmax Bottleneck
Sigsoftmax: Reanalysis of the Softmax Bottleneck
Sekitoshi Kanai
Yasuhiro Fujiwara
Yuki Yamanaka
S. Adachi
9
68
0
28 May 2018
Stable Recurrent Models
Stable Recurrent Models
John Miller
Moritz Hardt
11
116
0
25 May 2018
A Double-Deep Spatio-Angular Learning Framework for Light Field based
  Face Recognition
A Double-Deep Spatio-Angular Learning Framework for Light Field based Face Recognition
Alireza Sepas-Moghaddam
M. A. Haque
P. Correia
Kamal Nasrollahi
T. Moeslund
F. Pereira
CVBM
6
35
0
25 May 2018
Pushing the bounds of dropout
Pushing the bounds of dropout
Gábor Melis
Charles Blundell
Tomás Kociský
Karl Moritz Hermann
Chris Dyer
Phil Blunsom
8
13
0
23 May 2018
Breaking the Activation Function Bottleneck through Adaptive
  Parameterization
Breaking the Activation Function Bottleneck through Adaptive Parameterization
Sebastian Flennerhag
Hujun Yin
J. Keane
Mark Elliot
14
12
0
22 May 2018
Improved Sentence Modeling using Suffix Bidirectional LSTM
Improved Sentence Modeling using Suffix Bidirectional LSTM
Siddhartha Brahma
16
24
0
18 May 2018
Learning to Write with Cooperative Discriminators
Learning to Write with Cooperative Discriminators
Ari Holtzman
Jan Buys
Maxwell Forbes
Antoine Bosselut
David Golub
Yejin Choi
12
233
0
16 May 2018
Continuous Learning in a Hierarchical Multiscale Neural Network
Continuous Learning in a Hierarchical Multiscale Neural Network
Thomas Wolf
Julien Chaumond
Clement Delangue
CLL
AI4CE
NoLa
BDL
11
6
0
15 May 2018
Building Language Models for Text with Named Entities
Building Language Models for Text with Named Entities
Md. Rizwan Parvez
Saikat Chakraborty
Baishakhi Ray
Kai-Wei Chang
10
41
0
13 May 2018
Born Again Neural Networks
Born Again Neural Networks
Tommaso Furlanello
Zachary Chase Lipton
Michael Tschannen
Laurent Itti
Anima Anandkumar
30
1,020
0
12 May 2018
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
Urvashi Khandelwal
He He
Peng Qi
Dan Jurafsky
RALM
9
293
0
12 May 2018
State Gradients for RNN Memory Analysis
State Gradients for RNN Memory Analysis
Lyan Verwimp
Hugo Van hamme
Vincent Renkens
P. Wambacq
6
6
0
11 May 2018
Noisin: Unbiased Regularization for Recurrent Neural Networks
Noisin: Unbiased Regularization for Recurrent Neural Networks
Adji Bousso Dieng
Rajesh Ranganath
Jaan Altosaar
David M. Blei
17
22
0
03 May 2018
Assessing Language Models with Scaling Properties
Assessing Language Models with Scaling Properties
Shuntaro Takahashi
Kumiko Tanaka-Ishii
ELM
LRM
14
2
0
24 Apr 2018
Dropping Networks for Transfer Learning
Dropping Networks for Transfer Learning
J. Ó. Neill
Danushka Bollegala
9
1
0
23 Apr 2018
Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model
Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model
Sabrina J. Mielke
Jason Eisner
LRM
BDL
8
33
0
23 Apr 2018
Training DNNs with Hybrid Block Floating Point
Training DNNs with Hybrid Block Floating Point
M. Drumond
Tao R. Lin
Martin Jaggi
Babak Falsafi
17
94
0
04 Apr 2018
Aggregated Momentum: Stability Through Passive Damping
Aggregated Momentum: Stability Through Passive Damping
James Lucas
Shengyang Sun
R. Zemel
Roger C. Grosse
16
67
0
01 Apr 2018
Meta-Learning a Dynamical Language Model
Meta-Learning a Dynamical Language Model
Thomas Wolf
Julien Chaumond
Clement Delangue
16
4
0
28 Mar 2018
An Analysis of Neural Language Modeling at Multiple Scales
An Analysis of Neural Language Modeling at Multiple Scales
Stephen Merity
N. Keskar
R. Socher
19
170
0
22 Mar 2018
Flipout: Efficient Pseudo-Independent Weight Perturbations on
  Mini-Batches
Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches
Yeming Wen
Paul Vicol
Jimmy Ba
Dustin Tran
Roger C. Grosse
BDL
9
307
0
12 Mar 2018
An Empirical Evaluation of Generic Convolutional and Recurrent Networks
  for Sequence Modeling
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Shaojie Bai
J. Zico Kolter
V. Koltun
DRL
13
4,708
0
04 Mar 2018
Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning
Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning
Yichi Zhang
Zhijian Ou
17
0
0
01 Mar 2018
Memory-based Parameter Adaptation
Memory-based Parameter Adaptation
Pablo Sprechmann
Siddhant M. Jayakumar
Jack W. Rae
Alexander Pritzel
Adria Puigdomenech Badia
Benigno Uria
Oriol Vinyals
Demis Hassabis
Razvan Pascanu
Charles Blundell
ODL
OOD
VLM
6
101
0
28 Feb 2018
Reusing Weights in Subword-aware Neural Language Models
Reusing Weights in Subword-aware Neural Language Models
Z. Assylbekov
Rustem Takhanov
18
4
0
23 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in
  Neural Networks
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
D. Song
45
1,111
0
22 Feb 2018
Deep contextualized word representations
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
14
11,483
0
15 Feb 2018
Neural Voice Cloning with a Few Samples
Neural Voice Cloning with a Few Samples
Sercan Ö. Arik
Jitong Chen
Kainan Peng
Wei Ping
Yanqi Zhou
11
380
0
14 Feb 2018
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
Hieu H. Pham
M. Guan
Barret Zoph
Quoc V. Le
J. Dean
19
2,745
0
09 Feb 2018
Universal Language Model Fine-tuning for Text Classification
Universal Language Model Fine-tuning for Text Classification
Jeremy Howard
Sebastian Ruder
VLM
19
274
0
18 Jan 2018
Fix your classifier: the marginal value of training the last weight
  layer
Fix your classifier: the marginal value of training the last weight layer
Elad Hoffer
Itay Hubara
Daniel Soudry
27
101
0
14 Jan 2018
Character-level Recurrent Neural Networks in Practice: Comparing
  Training and Sampling Schemes
Character-level Recurrent Neural Networks in Practice: Comparing Training and Sampling Schemes
Cedric De Boom
Thomas Demeester
Bart Dhoedt
8
8
0
02 Jan 2018
Improving Generalization Performance by Switching from Adam to SGD
Improving Generalization Performance by Switching from Adam to SGD
N. Keskar
R. Socher
ODL
19
520
0
20 Dec 2017
A Flexible Approach to Automated RNN Architecture Generation
A Flexible Approach to Automated RNN Architecture Generation
Martin Schrimpf
Stephen Merity
James Bradbury
R. Socher
19
15
0
20 Dec 2017
Characterizing the hyper-parameter space of LSTM language models for
  mixed context applications
Characterizing the hyper-parameter space of LSTM language models for mixed context applications
Victor Akinwande
S. Remy
19
1
0
08 Dec 2017
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Zhilin Yang
Zihang Dai
Ruslan Salakhutdinov
William W. Cohen
BDL
16
364
0
10 Nov 2017
Weighted Transformer Network for Machine Translation
Weighted Transformer Network for Machine Translation
Karim Ahmed
N. Keskar
R. Socher
25
133
0
06 Nov 2017
Previous
123...10119
Next