ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.01462
  4. Cited By
Tying Word Vectors and Word Classifiers: A Loss Framework for Language
  Modeling
v1v2v3 (latest)

Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

4 November 2016
Hakan Inan
Khashayar Khosravi
R. Socher
ArXiv (abs)PDFHTML

Papers citing "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling"

50 / 237 papers shown
Title
Data Augmentation Using Many-To-Many RNNs for Session-Aware Recommender
  Systems
Data Augmentation Using Many-To-Many RNNs for Session-Aware Recommender Systems
Martín Baigorria Alonso
60
2
0
22 Aug 2021
Transformers with multi-modal features and post-fusion context for
  e-commerce session-based recommendation
Transformers with multi-modal features and post-fusion context for e-commerce session-based recommendation
Gabriel de Souza P. Moreira
Sara Rabhi
Ronay Ak
Md Yasin Kabir
Even Oldridge
125
32
0
11 Jul 2021
Training Graph Neural Networks with 1000 Layers
Training Graph Neural Networks with 1000 LayersInternational Conference on Machine Learning (ICML), 2021
Guohao Li
Matthias Muller
Guohao Li
V. Koltun
GNNAI4CE
341
274
0
14 Jun 2021
Exploring Unsupervised Pretraining Objectives for Machine Translation
Exploring Unsupervised Pretraining Objectives for Machine TranslationFindings (Findings), 2021
Christos Baziotis
Ivan Titov
Alexandra Birch
Barry Haddow
AAMLAI4CE
106
9
0
10 Jun 2021
Spectral Pruning for Recurrent Neural Networks
Spectral Pruning for Recurrent Neural NetworksInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Takashi Furuya
Kazuma Suetake
K. Taniguchi
Hiroyuki Kusumoto
Ryuji Saiin
Tomohiro Daimon
151
4
0
23 May 2021
Attention vs non-attention for a Shapley-based explanation method
Attention vs non-attention for a Shapley-based explanation methodWorkshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out (DEELIO), 2021
T. Kersten
Hugh Mee Wong
Jaap Jumelet
Dieuwke Hupkes
236
4
0
26 Apr 2021
When FastText Pays Attention: Efficient Estimation of Word
  Representations using Constrained Positional Weighting
When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting
Vít Novotný
Michal Štefánik
E. F. Ayetiran
Petr Sojka
Radim Řehůřek
260
6
0
19 Apr 2021
Neural Architecture Search for Image Super-Resolution Using Densely
  Constructed Search Space: DeCoNAS
Neural Architecture Search for Image Super-Resolution Using Densely Constructed Search Space: DeCoNASInternational Conference on Pattern Recognition (ICPR), 2021
Joonyoung Ahn
N. Cho
129
12
0
19 Apr 2021
Convex Aggregation for Opinion Summarization
Convex Aggregation for Opinion SummarizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Hayate Iso
Xiaolan Wang
Yoshihiko Suhara
Stefanos Angelidis
W. Tan
304
37
0
03 Apr 2021
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
270
81
0
24 Mar 2021
Alleviate Exposure Bias in Sequence Prediction \\ with Recurrent Neural
  Networks
Alleviate Exposure Bias in Sequence Prediction \\ with Recurrent Neural Networks
Liping Yuan
Jiangtao Feng
Xiaoqing Zheng
Xuanjing Huang
156
1
0
22 Mar 2021
Learning a Word-Level Language Model with Sentence-Level Noise
  Contrastive Estimation for Contextual Sentence Probability Estimation
Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation
Heewoong Park
Sukhyun Cho
Jonghun Park
99
0
0
14 Mar 2021
The Rediscovery Hypothesis: Language Models Need to Meet Linguistics
The Rediscovery Hypothesis: Language Models Need to Meet LinguisticsJournal of Artificial Intelligence Research (JAIR), 2021
Vassilina Nikoulina
Maxat Tezekbayev
Nuradil Kozhakhmet
Madina Babazhanova
Matthias Gallé
Z. Assylbekov
178
8
0
02 Mar 2021
Doping: A technique for efficient compression of LSTM models using
  sparse structured additive matrices
Doping: A technique for efficient compression of LSTM models using sparse structured additive matricesConference on Machine Learning and Systems (MLSys), 2021
Urmish Thakker
P. Whatmough
Zhi-Gang Liu
Matthew Mattina
Jesse G. Beu
105
7
0
14 Feb 2021
Adaptive Semiparametric Language Models
Adaptive Semiparametric Language ModelsTransactions of the Association for Computational Linguistics (TACL), 2021
Dani Yogatama
Cyprien de Masson dÁutume
Lingpeng Kong
KELMRALM
162
105
0
04 Feb 2021
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter InputsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Ofir Press
Noah A. Smith
M. Lewis
628
95
0
31 Dec 2020
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision
  and Language Research in Turkish
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in TurkishMachine Translation (MT), 2020
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
186
9
0
13 Dec 2020
Learning Contextualised Cross-lingual Word Embeddings and Alignments for
  Extremely Low-Resource Languages Using Parallel Corpora
Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora
Takashi Wada
Tomoharu Iwata
Yuji Matsumoto
Timothy Baldwin
Jey Han Lau
270
8
0
27 Oct 2020
Rethinking embedding coupling in pre-trained language models
Rethinking embedding coupling in pre-trained language modelsInternational Conference on Learning Representations (ICLR), 2020
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
295
163
0
24 Oct 2020
Improving Low Compute Language Modeling with In-Domain Embedding
  Initialisation
Improving Low Compute Language Modeling with In-Domain Embedding Initialisation
Charles F Welch
Amélie Reymond
Jonathan K. Kummerfeld
AI4CE
230
5
0
29 Sep 2020
Grounded Compositional Outputs for Adaptive Language Modeling
Grounded Compositional Outputs for Adaptive Language ModelingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Nikolaos Pappas
Phoebe Mulcaire
Noah A. Smith
KELM
206
8
0
24 Sep 2020
Pruning Convolutional Filters using Batch Bridgeout
Pruning Convolutional Filters using Batch BridgeoutIEEE Access (IEEE Access), 2020
Najeeb Khan
Ian Stavness
87
3
0
23 Sep 2020
Local and Central Differential Privacy for Robustness and Privacy in
  Federated Learning
Local and Central Differential Privacy for Robustness and Privacy in Federated LearningNetwork and Distributed System Security Symposium (NDSS), 2020
Mohammad Naseri
Jamie Hayes
Emiliano De Cristofaro
FedML
274
190
0
08 Sep 2020
Adversarial Watermarking Transformer: Towards Tracing Text Provenance
  with Data Hiding
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data HidingIEEE Symposium on Security and Privacy (IEEE S&P), 2020
Sahar Abdelnabi
Mario Fritz
WaLM
284
183
0
07 Sep 2020
DeLighT: Deep and Light-weight Transformer
DeLighT: Deep and Light-weight Transformer
Sachin Mehta
Marjan Ghazvininejad
Srini Iyer
Luke Zettlemoyer
Hannaneh Hajishirzi
VLM
207
33
0
03 Aug 2020
A High-Quality Multilingual Dataset for Structured Documentation
  Translation
A High-Quality Multilingual Dataset for Structured Documentation TranslationConference on Machine Translation (WMT), 2020
Kazuma Hashimoto
Raffaella Buschiazzo
James Bradbury
Teresa Marshall
R. Socher
Caiming Xiong
131
23
0
24 Jun 2020
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine
  Translation
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
Jungo Kasai
Nikolaos Pappas
Hao Peng
James Cross
Noah A. Smith
337
147
0
18 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual AnnotationsComputer Vision and Pattern Recognition (CVPR), 2020
Karan Desai
Justin Johnson
SSLVLM
432
465
0
11 Jun 2020
An Overview of Neural Network Compression
An Overview of Neural Network Compression
James OÑeill
AI4CE
300
112
0
05 Jun 2020
rTop-k: A Statistical Estimation Approach to Distributed SGD
rTop-k: A Statistical Estimation Approach to Distributed SGD
L. P. Barnes
Huseyin A. Inan
Berivan Isik
Ayfer Özgür
204
68
0
21 May 2020
Staying True to Your Word: (How) Can Attention Become Explanation?
Staying True to Your Word: (How) Can Attention Become Explanation?
Martin Tutek
Jan Snajder
137
28
0
19 May 2020
Language Model Prior for Low-Resource Neural Machine Translation
Language Model Prior for Low-Resource Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Christos Baziotis
Barry Haddow
Alexandra Birch
396
59
0
30 Apr 2020
Preventing Posterior Collapse with Levenshtein Variational Autoencoder
Preventing Posterior Collapse with Levenshtein Variational Autoencoder
Serhii Havrylov
Ivan Titov
DRL
172
20
0
30 Apr 2020
Fast and Memory-Efficient Neural Code Completion
Fast and Memory-Efficient Neural Code CompletionIEEE Working Conference on Mining Software Repositories (MSR), 2020
Alexey Svyatkovskiy
Sebastian Lee
A. Hadjitofi
M. Riechert
Juliana Franco
Miltiadis Allamanis
252
98
0
28 Apr 2020
An Analysis of the Utility of Explicit Negative Examples to Improve the
  Syntactic Abilities of Neural Language Models
An Analysis of the Utility of Explicit Negative Examples to Improve the Syntactic Abilities of Neural Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Hiroshi Noji
Hiroya Takamura
190
15
0
06 Apr 2020
Dynamic Sampling and Selective Masking for Communication-Efficient
  Federated Learning
Dynamic Sampling and Selective Masking for Communication-Efficient Federated LearningIEEE Intelligent Systems (IEEE Intell. Syst.), 2020
Shaoxiong Ji
Wenqi Jiang
A. Walid
Xue Li
FedML
281
70
0
21 Mar 2020
PowerNorm: Rethinking Batch Normalization in Transformers
PowerNorm: Rethinking Batch Normalization in TransformersInternational Conference on Machine Learning (ICML), 2020
Sheng Shen
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
BDL
234
17
0
17 Mar 2020
ProGen: Language Modeling for Protein Generation
ProGen: Language Modeling for Protein GenerationbioRxiv (bioRxiv), 2020
Ali Madani
Bryan McCann
Nikhil Naik
N. Keskar
N. Anand
Raphael R. Eguchi
Po-Ssu Huang
R. Socher
191
315
0
08 Mar 2020
Modelling Latent Skills for Multitask Language Generation
Modelling Latent Skills for Multitask Language Generation
Kris Cao
Dani Yogatama
128
3
0
21 Feb 2020
Assessing the Memory Ability of Recurrent Neural Networks
Assessing the Memory Ability of Recurrent Neural NetworksEuropean Conference on Artificial Intelligence (ECAI), 2020
Cheng Zhang
Qiuchi Li
L. Hua
D. Song
113
6
0
18 Feb 2020
Transformer on a Diet
Transformer on a Diet
Chenguang Wang
Zihao Ye
Aston Zhang
Zheng Zhang
Alex Smola
220
9
0
14 Feb 2020
Deep Learning for Source Code Modeling and Generation: Models,
  Applications and Challenges
Deep Learning for Source Code Modeling and Generation: Models, Applications and ChallengesACM Computing Surveys (ACM CSUR), 2020
T. H. Le
Hao Chen
Muhammad Ali Babar
VLM
233
170
0
13 Feb 2020
Regularizing activations in neural networks via distribution matching
  with the Wasserstein metric
Regularizing activations in neural networks via distribution matching with the Wasserstein metricInternational Conference on Learning Representations (ICLR), 2020
Taejong Joo
Donggu Kang
Byunghoon Kim
169
8
0
13 Feb 2020
A deep-learning view of chemical space designed to facilitate drug
  discovery
A deep-learning view of chemical space designed to facilitate drug discoveryJournal of Chemical Information and Modeling (JCIM), 2020
P. Maragakis
Hunter M. Nisonoff
B. Cole
D. Shaw
187
33
0
07 Feb 2020
Applying Recent Innovations from NLP to MOOC Student Course Trajectory
  Modeling
Applying Recent Innovations from NLP to MOOC Student Course Trajectory ModelingEducational Data Mining (EDM), 2020
Clare Chen
Z. Pardos
48
4
0
23 Jan 2020
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity
Konpat Preechakul
B. Kijsirikul
ODL
68
3
0
24 Dec 2019
Pythia: AI-assisted Code Completion System
Pythia: AI-assisted Code Completion SystemKnowledge Discovery and Data Mining (KDD), 2019
Alexey Svyatkovskiy
Ying Zhao
Shengyu Fu
Neel Sundaresan
196
174
0
29 Nov 2019
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence
  Modeling
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence ModelingInternational Conference on Learning Representations (ICLR), 2019
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
AI4TS
312
27
0
27 Nov 2019
Single Headed Attention RNN: Stop Thinking With Your Head
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
232
70
0
26 Nov 2019
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question
  Answering
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question AnsweringInternational Conference on Learning Representations (ICLR), 2019
Akari Asai
Kazuma Hashimoto
Hannaneh Hajishirzi
R. Socher
Caiming Xiong
RALMKELMLRM
433
310
0
24 Nov 2019
Previous
12345
Next