Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.07843
Cited By
Pointer Sentinel Mixture Models
26 September 2016
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pointer Sentinel Mixture Models"
50 / 577 papers shown
Title
Learning to Encode Position for Transformer with Continuous Dynamical Model
Xuanqing Liu
Hsiang-Fu Yu
Inderjit Dhillon
Cho-Jui Hsieh
8
107
0
13 Mar 2020
Temporal Convolutional Attention-based Network For Sequence Modeling
Hongyan Hao
Yan Wang
Siqiao Xue
Yudi Xia
Jian Zhao
S. Furao
16
41
0
28 Feb 2020
Statistical Adaptive Stochastic Gradient Methods
Pengchuan Zhang
Hunter Lang
Qiang Liu
Lin Xiao
ODL
11
11
0
25 Feb 2020
Limits of Detecting Text Generated by Large-Scale Language Models
L. Varshney
N. Keskar
R. Socher
DeLMO
16
18
0
09 Feb 2020
On the distance between two neural networks and the stability of learning
Jeremy Bernstein
Arash Vahdat
Yisong Yue
Ming-Yu Liu
ODL
200
57
0
09 Feb 2020
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
16
68
0
26 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
11
620
0
13 Nov 2019
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
11
87
0
10 Nov 2019
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
51
808
0
01 Nov 2019
On Generalization Bounds of a Family of Recurrent Neural Networks
Minshuo Chen
Xingguo Li
T. Zhao
11
70
0
28 Oct 2019
FineText: Text Classification via Attention-based Language Model Fine-tuning
Yunzhe Tao
Saurabh Gupta
Satyapriya Krishna
Xiong Zhou
Orchid Majumder
Vineet Khare
21
3
0
25 Oct 2019
Localization of Fake News Detection via Multitask Transfer Learning
Jan Christian Blaise Cruz
Julianne Agatha Tan
C. Cheng
23
33
0
21 Oct 2019
Improving Sequence Modeling Ability of Recurrent Neural Networks via Sememes
Yujia Qin
Fanchao Qi
Sicong Ouyang
Zhiyuan Liu
Cheng Yang
Yasheng Wang
Qun Liu
Maosong Sun
28
5
0
20 Oct 2019
Searching for A Robust Neural Architecture in Four GPU Hours
Xuanyi Dong
Yezhou Yang
8
646
0
10 Oct 2019
Kernel-Based Approaches for Sequence Modeling: Connections to Neural Methods
Kevin J Liang
Guoyin Wang
Yitong Li
Ricardo Henao
Lawrence Carin
27
2
0
09 Oct 2019
DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs
Yi-Lin Tuan
Yun-Nung (Vivian) Chen
Hung-yi Lee
13
71
0
01 Oct 2019
A Constructive Prediction of the Generalization Error Across Scales
Jonathan S. Rosenfeld
Amir Rosenfeld
Yonatan Belinkov
Nir Shavit
16
205
0
27 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
22
584
0
25 Sep 2019
PaLM: A Hybrid Parser and Language Model
Hao Peng
Roy Schwartz
Noah A. Smith
AIMat
18
15
0
04 Sep 2019
On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression
Genta Indra Winata
Andrea Madotto
Jamin Shin
Elham J. Barezi
Pascale Fung
19
28
0
27 Aug 2019
Techniques for Automated Machine Learning
Yi-Wei Chen
Qingquan Song
Xia Hu
6
47
0
21 Jul 2019
Few-Shot Representation Learning for Out-Of-Vocabulary Words
Ziniu Hu
Ting-Li Chen
Kai-Wei Chang
Yizhou Sun
16
75
0
01 Jul 2019
Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling
IV RobertL.Logan
Nelson F. Liu
Matthew E. Peters
Matt Gardner
Sameer Singh
RALM
14
186
0
17 Jun 2019
A Lightweight Recurrent Network for Sequence Modeling
Biao Zhang
Rico Sennrich
27
7
0
30 May 2019
A framework for the extraction of Deep Neural Networks by leveraging public data
Soham Pal
Yash Gupta
Aditya Shukla
Aditya Kanade
S. Shevade
V. Ganapathy
FedML
MLAU
MIACV
19
56
0
22 May 2019
AMR Parsing as Sequence-to-Graph Transduction
Sheng Zhang
Xutai Ma
Kevin Duh
Benjamin Van Durme
16
148
0
21 May 2019
Adaptively Truncating Backpropagation Through Time to Control Gradient Bias
Christopher Aicher
N. Foti
E. Fox
MQ
22
32
0
17 May 2019
Probing What Different NLP Tasks Teach Machines about Function Word Comprehension
Najoung Kim
Roma Patel
Adam Poliak
Alex Jinpeng Wang
Patrick Xia
...
Alexis Ross
Tal Linzen
Benjamin Van Durme
Samuel R. Bowman
Ellie Pavlick
20
105
0
25 Apr 2019
Language Models with Transformers
Chenguang Wang
Mu Li
Alex Smola
10
120
0
20 Apr 2019
Pun Generation with Surprise
He He
Nanyun Peng
Percy Liang
25
68
0
15 Apr 2019
Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization
Yangyang Shi
M. Hwang
X. Lei
Haoyu Sheng
26
25
0
08 Apr 2019
Identifying and Reducing Gender Bias in Word-Level Language Models
Shikha Bordia
Samuel R. Bowman
FaML
9
323
0
05 Apr 2019
Low Resource Text Classification with ULMFit and Backtranslation
Sam Shleifer
VLM
11
57
0
21 Mar 2019
Asynchronous Federated Optimization
Cong Xie
Oluwasanmi Koyejo
Indranil Gupta
FedML
13
561
0
10 Mar 2019
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities
O. Ganea
Sylvain Gelly
Gary Bécigneul
Aliaksei Severyn
21
18
0
21 Feb 2019
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Ritchie Zhao
Yuwei Hu
Jordan Dotzel
Christopher De Sa
Zhiru Zhang
OODD
MQ
33
304
0
28 Jan 2019
Global-to-local Memory Pointer Networks for Task-Oriented Dialogue
Chien-Sheng Wu
R. Socher
Caiming Xiong
13
165
0
15 Jan 2019
Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems
Victor Makarenkov
L. Rokach
Bracha Shapira
16
35
0
08 Jan 2019
RNNs Implicitly Implement Tensor Product Representations
R. Thomas McCoy
Tal Linzen
Ewan Dunbar
P. Smolensky
46
54
0
20 Dec 2018
Learning Private Neural Language Modeling with Attentive Aggregation
Shaoxiong Ji
Shirui Pan
Guodong Long
Xue Li
Jing Jiang
Zi Huang
FedML
MoMe
16
136
0
17 Dec 2018
Can I trust you more? Model-Agnostic Hierarchical Explanations
Michael Tsang
Youbang Sun
Dongxu Ren
Yan Liu
FAtt
16
25
0
12 Dec 2018
Parameter Re-Initialization through Cyclical Batch Size Schedules
Norman Mu
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
27
8
0
04 Dec 2018
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Yikang Shen
Shawn Tan
Alessandro Sordoni
Aaron Courville
26
322
0
22 Oct 2018
Fast deep reinforcement learning using online adjustments from the past
S. Hansen
Pablo Sprechmann
Alexander Pritzel
André Barreto
Charles Blundell
TTA
OffRL
OnRL
16
42
0
18 Oct 2018
Quasi-hyperbolic momentum and Adam for deep learning
Jerry Ma
Denis Yarats
ODL
81
129
0
16 Oct 2018
Trellis Networks for Sequence Modeling
Shaojie Bai
J. Zico Kolter
V. Koltun
17
145
0
15 Oct 2018
Learning Compressed Transforms with Low Displacement Rank
Anna T. Thomas
Albert Gu
Tri Dao
Atri Rudra
Christopher Ré
27
40
0
04 Oct 2018
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
21
386
0
28 Sep 2018
Information-Weighted Neural Cache Language Models for ASR
Lyan Verwimp
J. Pelemans
Hugo Van hamme
P. Wambacq
KELM
RALM
9
2
0
24 Sep 2018
Direct Output Connection for a High-Rank Language Model
Sho Takase
Jun Suzuki
Masaaki Nagata
18
36
0
30 Aug 2018
Previous
1
2
3
...
10
11
12
Next