ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.01462
  4. Cited By
Tying Word Vectors and Word Classifiers: A Loss Framework for Language
  Modeling

Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

4 November 2016
Hakan Inan
Khashayar Khosravi
R. Socher
ArXivPDFHTML

Papers citing "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling"

50 / 67 papers shown
Title
Merging Feed-Forward Sublayers for Compressed Transformers
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
50
0
0
10 Jan 2025
Masked Generative Priors Improve World Models Sequence Modelling Capabilities
Masked Generative Priors Improve World Models Sequence Modelling Capabilities
Cristian Meo
Mircea Lica
Zarif Ikram
Akihiro Nakano
Vedant Shah
Aniket Didolkar
Dianbo Liu
Anirudh Goyal
Justin Dauwels
OffRL
90
0
0
10 Oct 2024
DEPT: Decoupled Embeddings for Pre-training Language Models
DEPT: Decoupled Embeddings for Pre-training Language Models
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
147
0
0
07 Oct 2024
What makes math problems hard for reinforcement learning: a case study
What makes math problems hard for reinforcement learning: a case study
Ali Shehper
A. Medina-Mardones
Lucas Fagan
Angus Gruen
Piotr Kucharski
Sergei Gukov
Piotr Kucharski
Zhenghan Wang
Sergei Gukov
32
3
0
27 Aug 2024
The mechanistic basis of data dependence and abrupt learning in an
  in-context classification task
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy
27
50
0
03 Dec 2023
Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying
Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying
Adithya Renduchintala
Tugrul Konuk
Oleksii Kuchaiev
MoMe
23
41
0
16 Nov 2023
Exploring Representational Disparities Between Multilingual and
  Bilingual Translation Models
Exploring Representational Disparities Between Multilingual and Bilingual Translation Models
Neha Verma
Kenton W. Murray
Kevin Duh
14
0
0
23 May 2023
When Does Monolingual Data Help Multilingual Translation: The Role of
  Domain and Model Scale
When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale
Christos Baziotis
Biao Zhang
Alexandra Birch
Barry Haddow
30
2
0
23 May 2023
SPEC: Summary Preference Decomposition for Low-Resource Abstractive
  Summarization
SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization
Yi-Syuan Chen
Yun-Zhu Song
Hong-Han Shuai
33
6
0
24 Mar 2023
Generative Adversarial Training Can Improve Neural Language Models
Generative Adversarial Training Can Improve Neural Language Models
Sajad Movahedi
A. Shakery
GAN
AI4CE
34
2
0
02 Nov 2022
Bilingual Synchronization: Restoring Translational Relationships with
  Editing Operations
Bilingual Synchronization: Restoring Translational Relationships with Editing Operations
Jitao Xu
Josep Crego
François Yvon
30
4
0
24 Oct 2022
ViGAT: Bottom-up event recognition and explanation in video using
  factorized graph attention network
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
16
10
0
20 Jul 2022
Twist Decoding: Diverse Generators Guide Each Other
Twist Decoding: Diverse Generators Guide Each Other
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
27
4
0
19 May 2022
Joint Generation of Captions and Subtitles with Dual Decoding
Joint Generation of Captions and Subtitles with Dual Decoding
Jitao Xu
François Buet
Josep Crego
Elise Bertin-Lemée
François Yvon
22
8
0
13 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
  Languages
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
48
37
0
02 May 2022
Linearizing Transformer with Key-Value Memory
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
20
5
0
23 Mar 2022
Tied & Reduced RNN-T Decoder
Tied & Reduced RNN-T Decoder
Rami Botros
Tara N. Sainath
R. David
Emmanuel Guzman
Wei Li
Yanzhang He
38
55
0
15 Sep 2021
Training Graph Neural Networks with 1000 Layers
Training Graph Neural Networks with 1000 Layers
Guohao Li
Matthias Muller
Guohao Li
V. Koltun
GNN
AI4CE
51
235
0
14 Jun 2021
Spectral Pruning for Recurrent Neural Networks
Spectral Pruning for Recurrent Neural Networks
Takashi Furuya
Kazuma Suetake
K. Taniguchi
Hiroyuki Kusumoto
Ryuji Saiin
Tomohiro Daimon
27
4
0
23 May 2021
The Rediscovery Hypothesis: Language Models Need to Meet Linguistics
The Rediscovery Hypothesis: Language Models Need to Meet Linguistics
Vassilina Nikoulina
Maxat Tezekbayev
Nuradil Kozhakhmet
Madina Babazhanova
Matthias Gallé
Z. Assylbekov
34
8
0
02 Mar 2021
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
230
89
0
31 Dec 2020
Rethinking embedding coupling in pre-trained language models
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
95
142
0
24 Oct 2020
Pruning Convolutional Filters using Batch Bridgeout
Pruning Convolutional Filters using Batch Bridgeout
Najeeb Khan
Ian Stavness
23
3
0
23 Sep 2020
Adversarial Watermarking Transformer: Towards Tracing Text Provenance
  with Data Hiding
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding
Sahar Abdelnabi
Mario Fritz
WaLM
23
143
0
07 Sep 2020
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine
  Translation
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
Jungo Kasai
Nikolaos Pappas
Hao Peng
James Cross
Noah A. Smith
38
134
0
18 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
30
432
0
11 Jun 2020
An Overview of Neural Network Compression
An Overview of Neural Network Compression
James OÑeill
AI4CE
45
98
0
05 Jun 2020
rTop-k: A Statistical Estimation Approach to Distributed SGD
rTop-k: A Statistical Estimation Approach to Distributed SGD
L. P. Barnes
Huseyin A. Inan
Berivan Isik
Ayfer Özgür
32
65
0
21 May 2020
Dynamic Sampling and Selective Masking for Communication-Efficient
  Federated Learning
Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning
Shaoxiong Ji
Wenqi Jiang
A. Walid
Xue Li
FedML
28
66
0
21 Mar 2020
ProGen: Language Modeling for Protein Generation
ProGen: Language Modeling for Protein Generation
Ali Madani
Bryan McCann
Nikhil Naik
N. Keskar
N. Anand
Raphael R. Eguchi
Po-Ssu Huang
R. Socher
26
275
0
08 Mar 2020
A deep-learning view of chemical space designed to facilitate drug
  discovery
A deep-learning view of chemical space designed to facilitate drug discovery
P. Maragakis
Hunter M. Nisonoff
B. Cole
D. Shaw
34
28
0
07 Feb 2020
Single Headed Attention RNN: Stop Thinking With Your Head
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
19
68
0
26 Nov 2019
Improving Transformer Models by Reordering their Sublayers
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
11
87
0
10 Nov 2019
Federated Evaluation of On-device Personalization
Federated Evaluation of On-device Personalization
Kangkang Wang
Rajiv Mathews
Chloé Kiddon
Hubert Eichner
F. Beaufays
Daniel Ramage
FedML
13
282
0
22 Oct 2019
Searching for A Robust Neural Architecture in Four GPU Hours
Searching for A Robust Neural Architecture in Four GPU Hours
Xuanyi Dong
Yezhou Yang
20
646
0
10 Oct 2019
CTRL: A Conditional Transformer Language Model for Controllable
  Generation
CTRL: A Conditional Transformer Language Model for Controllable Generation
N. Keskar
Bryan McCann
L. Varshney
Caiming Xiong
R. Socher
AI4CE
57
1,233
0
11 Sep 2019
Relating Simple Sentence Representations in Deep Neural Networks and the
  Brain
Relating Simple Sentence Representations in Deep Neural Networks and the Brain
Sharmistha Jat
Hao Tang
Partha P. Talukdar
Tom Michael Mitchell
22
21
0
27 Jun 2019
Language Models with Transformers
Language Models with Transformers
Chenguang Wang
Mu Li
Alex Smola
15
120
0
20 Apr 2019
Knowledge Distillation For Recurrent Neural Network Language Modeling
  With Trust Regularization
Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization
Yangyang Shi
M. Hwang
X. Lei
Haoyu Sheng
26
25
0
08 Apr 2019
Context Vectors are Reflections of Word Vectors in Half the Dimensions
Context Vectors are Reflections of Word Vectors in Half the Dimensions
Z. Assylbekov
Rustem Takhanov
6
10
0
26 Feb 2019
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise
  Non-linearities
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities
O. Ganea
Sylvain Gelly
Gary Bécigneul
Aliaksei Severyn
21
18
0
21 Feb 2019
Learning Private Neural Language Modeling with Attentive Aggregation
Learning Private Neural Language Modeling with Attentive Aggregation
Shaoxiong Ji
Shirui Pan
Guodong Long
Xue Li
Jing Jiang
Zi Huang
FedML
MoMe
16
136
0
17 Dec 2018
Federated Learning for Mobile Keyboard Prediction
Federated Learning for Mobile Keyboard Prediction
Andrew Straiton Hard
Kanishka Rao
Zhifeng Lin
Swaroop Indra Ramaswamy
Youjie Li
S. Augenstein
A. Schwing
M. Annavaram
A. Avestimehr
FedML
9
1,510
0
08 Nov 2018
Ordered Neurons: Integrating Tree Structures into Recurrent Neural
  Networks
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Yikang Shen
Shawn Tan
Alessandro Sordoni
Aaron Courville
32
322
0
22 Oct 2018
Adaptive Input Representations for Neural Language Modeling
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
21
387
0
28 Sep 2018
Distilled Wasserstein Learning for Word Embedding and Topic Modeling
Distilled Wasserstein Learning for Word Embedding and Topic Modeling
Hongteng Xu
Wenlin Wang
Wen Liu
Lawrence Carin
MedIm
FedML
32
84
0
12 Sep 2018
Direct Output Connection for a High-Rank Language Model
Direct Output Connection for a High-Rank Language Model
Sho Takase
Jun Suzuki
Masaaki Nagata
18
36
0
30 Aug 2018
Pyramidal Recurrent Unit for Language Modeling
Pyramidal Recurrent Unit for Language Modeling
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
21
10
0
27 Aug 2018
Neural Document Summarization by Jointly Learning to Score and Select
  Sentences
Neural Document Summarization by Jointly Learning to Score and Select Sentences
Qingyu Zhou
Nan Yang
Furu Wei
Shaohan Huang
M. Zhou
T. Zhao
20
320
0
06 Jul 2018
GILE: A Generalized Input-Label Embedding for Text Classification
GILE: A Generalized Input-Label Embedding for Text Classification
Nikolaos Pappas
James Henderson
AI4TS
AILaw
VLM
27
79
0
16 Jun 2018
12
Next