ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.07843
  4. Cited By
Pointer Sentinel Mixture Models

Pointer Sentinel Mixture Models

26 September 2016
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
    RALM
ArXivPDFHTML

Papers citing "Pointer Sentinel Mixture Models"

50 / 577 papers shown
Title
Straight to the Gradient: Learning to Use Novel Tokens for Neural Text
  Generation
Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation
Xiang Lin
Simeng Han
Shafiq R. Joty
17
24
0
14 Jun 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
26
57
0
11 Jun 2021
Linguistically Informed Masking for Representation Learning in the
  Patent Domain
Linguistically Informed Masking for Representation Learning in the Patent Domain
Sophia Althammer
Mark Buckley
Sebastian Hofstatter
Allan Hanbury
37
11
0
10 Jun 2021
Differentiable Model Compression via Pseudo Quantization Noise
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffM
MQ
15
47
0
20 Apr 2021
Broccoli: Sprinkling Lightweight Vocabulary Learning into Everyday
  Information Diets
Broccoli: Sprinkling Lightweight Vocabulary Learning into Everyday Information Diets
Roland Aydin
Lars Klein
Arnaud Miribel
Robert West
13
1
0
16 Apr 2021
Full Page Handwriting Recognition via Image to Sequence Extraction
Full Page Handwriting Recognition via Image to Sequence Extraction
Sumeet S. Singh
Sergey Karayev
19
53
0
11 Mar 2021
The Rediscovery Hypothesis: Language Models Need to Meet Linguistics
The Rediscovery Hypothesis: Language Models Need to Meet Linguistics
Vassilina Nikoulina
Maxat Tezekbayev
Nuradil Kozhakhmet
Madina Babazhanova
Matthias Gallé
Z. Assylbekov
31
8
0
02 Mar 2021
Linear Transformers Are Secretly Fast Weight Programmers
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
29
224
0
22 Feb 2021
Dancing along Battery: Enabling Transformer with Run-time
  Reconfigurability on Mobile Devices
Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile Devices
Yuhong Song
Weiwen Jiang
Bingbing Li
Panjie Qi
Qingfeng Zhuge
E. Sha
Sakyasingha Dasgupta
Yiyu Shi
Caiwen Ding
18
18
0
12 Feb 2021
A Comprehensive Survey on Hardware-Aware Neural Architecture Search
A Comprehensive Survey on Hardware-Aware Neural Architecture Search
Hadjer Benmeziane
K. E. Maghraoui
Hamza Ouarnoughi
Smail Niar
Martin Wistuba
Naigang Wang
31
96
0
22 Jan 2021
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Hieu H. Pham
Quoc V. Le
70
56
0
05 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
228
89
0
31 Dec 2020
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
Siyu Ding
Junyuan Shang
Shuohuan Wang
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
63
52
0
31 Dec 2020
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Ahmad Rashid
Vasileios Lioutas
Abbas Ghaddar
Mehdi Rezagholizadeh
13
27
0
31 Dec 2020
CLEAR: Contrastive Learning for Sentence Representation
CLEAR: Contrastive Learning for Sentence Representation
Zhuofeng Wu
Sinong Wang
Jiatao Gu
Madian Khabsa
Fei Sun
Hao Ma
SSL
33
319
0
31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
22
741
0
29 Dec 2020
A Theoretical Analysis of the Repetition Problem in Text Generation
A Theoretical Analysis of the Repetition Problem in Text Generation
Z. Fu
Wai Lam
Anthony Man-Cho So
Bei Shi
74
90
0
29 Dec 2020
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
Y. Fu
Haoran You
Yang Katie Zhao
Yue Wang
Chaojian Li
K. Gopalakrishnan
Zhangyang Wang
Yingyan Lin
MQ
30
32
0
24 Dec 2020
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and
  Head Pruning
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Hanrui Wang
Zhekai Zhang
Song Han
26
374
0
17 Dec 2020
Multi-Sense Language Modelling
Multi-Sense Language Modelling
Andrea Lekkas
Peter Schneider-Kamp
Isabelle Augenstein
KELM
11
2
0
10 Dec 2020
Efficient Estimation of Influence of a Training Instance
Efficient Estimation of Influence of a Training Instance
Sosuke Kobayashi
Sho Yokoi
Jun Suzuki
Kentaro Inui
TDI
27
15
0
08 Dec 2020
Adversarial Black-Box Attacks On Text Classifiers Using Multi-Objective
  Genetic Optimization Guided By Deep Networks
Adversarial Black-Box Attacks On Text Classifiers Using Multi-Objective Genetic Optimization Guided By Deep Networks
Alex Mathai
Shreya Khare
Srikanth G. Tamilselvam
Senthil Mani
AAML
28
6
0
08 Nov 2020
Concealed Data Poisoning Attacks on NLP Models
Concealed Data Poisoning Attacks on NLP Models
Eric Wallace
Tony Zhao
Shi Feng
Sameer Singh
SILM
11
18
0
23 Oct 2020
Limitations of Autoregressive Models and Their Alternatives
Limitations of Autoregressive Models and Their Alternatives
Chu-cheng Lin
Aaron Jaech
Xin Li
Matthew R. Gormley
Jason Eisner
29
58
0
22 Oct 2020
Are Some Words Worth More than Others?
Are Some Words Worth More than Others?
Shiran Dudy
Steven Bedrick
13
14
0
12 Oct 2020
Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
Dongyeop Kang
Eduard H. Hovy
LRM
40
24
0
11 Oct 2020
Knowledge-Enriched Distributional Model Inversion Attacks
Knowledge-Enriched Distributional Model Inversion Attacks
Si-An Chen
Mostafa Kahla
R. Jia
Guo-Jun Qi
16
93
0
08 Oct 2020
Learning to Recombine and Resample Data for Compositional Generalization
Learning to Recombine and Resample Data for Compositional Generalization
Ekin Akyürek
Afra Feyza Akyürek
Jacob Andreas
24
79
0
08 Oct 2020
Multi-timescale Representation Learning in LSTM Language Models
Multi-timescale Representation Learning in LSTM Language Models
Shivangi Mahto
Vy A. Vo
Javier S. Turek
Alexander G. Huth
15
29
0
27 Sep 2020
Automated Source Code Generation and Auto-completion Using Deep
  Learning: Comparing and Discussing Current Language-Model-Related Approaches
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches
Juan Cruz-Benito
Sanjay Vishwakarma
Francisco Martín-Fernández
Ismael Faro Ibm Quantum
22
30
0
16 Sep 2020
Efficient Transformers: A Survey
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
82
1,101
0
14 Sep 2020
Probabilistic Predictions of People Perusing: Evaluating Metrics of
  Language Model Performance for Psycholinguistic Modeling
Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling
Sophie Hao
S. Mendelsohn
Rachel Sterneck
Randi Martinez
Robert Frank
11
46
0
08 Sep 2020
Adversarial Watermarking Transformer: Towards Tracing Text Provenance
  with Data Hiding
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding
Sahar Abdelnabi
Mario Fritz
WaLM
18
143
0
07 Sep 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch
  Training
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
18
8
0
28 Jul 2020
FTRANS: Energy-Efficient Acceleration of Transformers using FPGA
FTRANS: Energy-Efficient Acceleration of Transformers using FPGA
Bingbing Li
Santosh Pandey
Haowen Fang
Yanjun Lyv
Ji Li
Jieyang Chen
Mimi Xie
Lipeng Wan
Hang Liu
Caiwen Ding
AI4CE
8
168
0
16 Jul 2020
Hopfield Networks is All You Need
Hopfield Networks is All You Need
Hubert Ramsauer
Bernhard Schafl
Johannes Lehner
Philipp Seidl
Michael Widrich
...
David P. Kreil
Michael K Kopp
G. Klambauer
Johannes Brandstetter
Sepp Hochreiter
24
412
0
16 Jul 2020
A Survey of Privacy Attacks in Machine Learning
A Survey of Privacy Attacks in Machine Learning
M. Rigaki
Sebastian Garcia
PILM
AAML
30
213
0
15 Jul 2020
Term Revealing: Furthering Quantization at Run Time on Quantized DNNs
Term Revealing: Furthering Quantization at Run Time on Quantized DNNs
H. T. Kung
Bradley McDanel
S. Zhang
MQ
13
9
0
13 Jul 2020
Climbing the WOL: Training for Cheaper Inference
Climbing the WOL: Training for Cheaper Inference
Zichang Liu
Zhaozhuo Xu
A. Ji
Jonathan Li
Beidi Chen
Anshumali Shrivastava
TPM
16
7
0
02 Jul 2020
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and
  Architectures
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures
Julien Launay
Iacopo Poli
Franccois Boniface
Florent Krzakala
27
62
0
23 Jun 2020
The Depth-to-Width Interplay in Self-Attention
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
24
45
0
22 Jun 2020
Categorical Normalizing Flows via Continuous Transformations
Categorical Normalizing Flows via Continuous Transformations
Phillip Lippe
E. Gavves
BDL
15
43
0
17 Jun 2020
Copy that! Editing Sequences by Copying Spans
Copy that! Editing Sequences by Copying Spans
Sheena Panthaplackel
Miltiadis Allamanis
Marc Brockschmidt
BDL
18
28
0
08 Jun 2020
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
58
1,646
0
08 Jun 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
62
2,618
0
05 Jun 2020
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network
  Language Model
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model
Da-Rong Liu
Chunxi Liu
Frank Zhang
Gabriel Synnaeve
Yatharth Saraf
Geoffrey Zweig
23
19
0
15 May 2020
A Tale of Two Perplexities: Sensitivity of Neural Language Models to
  Lexical Retrieval Deficits in Dementia of the Alzheimer's Type
A Tale of Two Perplexities: Sensitivity of Neural Language Models to Lexical Retrieval Deficits in Dementia of the Alzheimer's Type
T. Cohen
Serguei V. S. Pakhomov
8
25
0
07 May 2020
DQI: Measuring Data Quality in NLP
DQI: Measuring Data Quality in NLP
Swaroop Mishra
Anjana Arunkumar
Bhavdeep Singh Sachdeva
Chris Bryan
Chitta Baral
28
30
0
02 May 2020
BERT-kNN: Adding a kNN Search Component to Pretrained Language Models
  for Better QA
BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA
Nora Kassner
Hinrich Schütze
RALM
19
68
0
02 May 2020
Dynamic Sampling and Selective Masking for Communication-Efficient
  Federated Learning
Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning
Shaoxiong Ji
Wenqi Jiang
A. Walid
Xue Li
FedML
28
66
0
21 Mar 2020
Previous
123...1011129
Next