Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1611.01462
Cited By
v1
v2
v3 (latest)
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
4 November 2016
Hakan Inan
Khashayar Khosravi
R. Socher
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling"
50 / 237 papers shown
Title
SURFing to the Fundamental Limit of Jet Tagging
Ian Pang
D. Faroughy
David Shih
Ranit Das
Gregor Kasieczka
70
1
0
19 Nov 2025
Inverse Language Modeling towards Robust and Grounded LLMs
Davide Gabrielli
Simone Sestito
Iacopo Masi
61
0
0
02 Oct 2025
Semantic Fusion with Fuzzy-Membership Features for Controllable Language Modelling
Yongchao Huang
Hassan Raza
72
0
0
14 Sep 2025
Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training
Woojin Chung
Jeonghoon Kim
164
0
0
21 Aug 2025
PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection
Junseo Hwang
Wonguk Cho
Taesup Kim
215
0
0
26 May 2025
Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity
M. Doumbouya
Dan Jurafsky
Christopher D. Manning
FedML
163
1
0
21 May 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
360
0
0
10 Jan 2025
Masked Generative Priors Improve World Models Sequence Modelling Capabilities
Cristian Meo
Mircea Lica
Zarif Ikram
Akihiro Nakano
Vedant Shah
Aniket Didolkar
Dianbo Liu
Anirudh Goyal
Justin Dauwels
OffRL
754
5
0
10 Oct 2024
DEPT: Decoupled Embeddings for Pre-training Language Models
International Conference on Learning Representations (ICLR), 2024
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
1.3K
2
0
07 Oct 2024
Stable Language Model Pre-training by Reducing Embedding Variability
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Woojin Chung
Jiwoo Hong
Na Min An
James Thorne
Se-Young Yun
143
5
0
12 Sep 2024
What makes math problems hard for reinforcement learning: a case study
Ali Shehper
A. Medina-Mardones
Lucas Fagan
Angus Gruen
Piotr Kucharski
Sergei Gukov
Piotr Kucharski
Zhenghan Wang
Sergei Gukov
139
6
0
27 Aug 2024
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
Neural Information Processing Systems (NeurIPS), 2024
R. Prabhakar
Hengrui Zhang
D. Wentzlaff
257
1
0
14 Aug 2024
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa
John Lafferty
294
6
0
26 May 2024
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sander Land
Max Bartolo
243
34
0
08 May 2024
Language models scale reliably with over-training and on downstream tasks
International Conference on Learning Representations (ICLR), 2024
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
273
74
0
13 Mar 2024
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
322
26
0
28 Dec 2023
Dotless Representation of Arabic Text: Analysis and Modeling
Maged S. Al-Shaibani
Irfan Ahmad
151
1
0
26 Dec 2023
Balanced and Deterministic Weight-sharing Helps Network Performance
Oscar Chang
Hod Lipson
105
0
0
13 Dec 2023
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
International Conference on Learning Representations (ICLR), 2023
Gautam Reddy
260
87
0
03 Dec 2023
Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying
Adithya Renduchintala
Tugrul Konuk
Oleksii Kuchaiev
MoMe
292
65
0
16 Nov 2023
Longer Fixations, More Computation: Gaze-Guided Recurrent Neural Networks
Xinting Huang
Jiajing Wan
Ioannis Kritikos
Nora Hollenstein
194
3
0
31 Oct 2023
Neural Bradley-Terry Rating: Quantifying Properties from Comparisons
International Conference on Agents and Artificial Intelligence (ICAART), 2023
Satoru Fujii
256
0
0
24 Jul 2023
Exploring Representational Disparities Between Multilingual and Bilingual Translation Models
International Conference on Language Resources and Evaluation (LREC), 2023
Neha Verma
Kenton W. Murray
Kevin Duh
186
0
0
23 May 2023
When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Christos Baziotis
Biao Zhang
Alexandra Birch
Barry Haddow
361
2
0
23 May 2023
Extending Memory for Language Modelling
A. Nugaliyadde
KELM
CLL
VLM
132
1
0
19 May 2023
Tensor Decomposition for Model Reduction in Neural Networks: A Review
IEEE Circuits and Systems Magazine (IEEE CAS Magazine), 2023
Xingyi Liu
Keshab K. Parhi
168
23
0
26 Apr 2023
SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yi-Syuan Chen
Yun-Zhu Song
Hong-Han Shuai
112
6
0
24 Mar 2023
Coordinating Distributed Example Orders for Provably Accelerated Training
Neural Information Processing Systems (NeurIPS), 2023
A. Feder Cooper
Wentao Guo
Khiem Pham
Tiancheng Yuan
Charlie F. Ruan
Yucheng Lu
Chris De Sa
489
9
0
02 Feb 2023
ConsistTL: Modeling Consistency in Transfer Learning for Low-Resource Neural Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhao Li
Xuebo Liu
Yang Li
Lidia S. Chao
Min Zhang
CLL
170
15
0
08 Dec 2022
Generative Adversarial Training Can Improve Neural Language Models
Sajad Movahedi
A. Shakery
GAN
AI4CE
136
2
0
02 Nov 2022
Bilingual Synchronization: Restoring Translational Relationships with Editing Operations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jitao Xu
Josep Crego
François Yvon
116
4
0
24 Oct 2022
FLCert: Provably Secure Federated Learning against Poisoning Attacks
IEEE Transactions on Information Forensics and Security (IEEE TIFS), 2022
Xiaoyu Cao
Zaixi Zhang
Jinyuan Jia
Neil Zhenqiang Gong
FedML
OOD
313
78
0
02 Oct 2022
Generalization in Neural Networks: A Broad Survey
Neurocomputing (Neurocomputing), 2022
Chris Rohlfs
OOD
AI4CE
213
16
0
04 Sep 2022
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network
IEEE Access (IEEE Access), 2022
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
178
12
0
20 Jul 2022
Efficient recurrent architectures through activity sparsity and sparse back-propagation through time
International Conference on Learning Representations (ICLR), 2022
Anand Subramoney
Khaleelulla Khan Nazeer
Mark Schöne
Christian Mayr
David Kappel
263
29
0
13 Jun 2022
Multilingual Machine Translation with Hyper-Adapters
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Christos Baziotis
Mikel Artetxe
James Cross
Shruti Bhosale
235
27
0
22 May 2022
Twist Decoding: Diverse Generators Guide Each Other
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
135
5
0
19 May 2022
Joint Generation of Captions and Subtitles with Dual Decoding
International Workshop on Spoken Language Translation (IWSLT), 2022
Jitao Xu
François Buet
Josep Crego
Elise Bertin-Lemée
François Yvon
134
8
0
13 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
187
44
0
02 May 2022
Linearizing Transformer with Key-Value Memory
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yizhe Zhang
Deng Cai
255
6
0
23 Mar 2022
Relational Memory Augmented Language Models
Transactions of the Association for Computational Linguistics (TACL), 2022
Qi Liu
Dani Yogatama
Phil Blunsom
KELM
RALM
235
34
0
24 Jan 2022
Automatic Sparse Connectivity Learning for Neural Networks
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Zhimin Tang
Linkai Luo
Bike Xie
Yiyu Zhu
Rujie Zhao
Lvqing Bi
Chao Lu
231
46
0
13 Jan 2022
Frequency-Aware Contrastive Learning for Neural Machine Translation
AAAI Conference on Artificial Intelligence (AAAI), 2021
Tong Zhang
Wei Ye
Baosong Yang
Long Zhang
Xingzhang Ren
Dayiheng Liu
Jinan Sun
Shikun Zhang
Haibo Zhang
Wen Zhao
152
34
0
29 Dec 2021
The Importance of the Current Input in Sequence Modeling
Artificial Intelligence Applications and Innovations (AIAI), 2021
Christian Oliva
Luis F. Lago-Fernández
3DV
91
1
0
22 Dec 2021
Hybrid Random Features
International Conference on Learning Representations (ICLR), 2021
K. Choromanski
Haoxian Chen
Han Lin
Yuanzhe Ma
Arijit Sehanobish
...
Andy Zeng
Valerii Likhosherstov
Dmitry Kalashnikov
Vikas Sindhwani
Adrian Weller
119
23
0
08 Oct 2021
One Source, Two Targets: Challenges and Rewards of Dual Decoding
Jitao Xu
François Yvon
187
6
0
21 Sep 2021
InvBERT: Reconstructing Text from Contextualized Word Embeddings by inverting the BERT pipeline
Emily M. Bender
Timnit Gebru
Eric
Wallace
156
13
0
21 Sep 2021
Tied & Reduced RNN-T Decoder
Rami Botros
Tara N. Sainath
R. David
Emmanuel Guzman
Wei Li
Yanzhang He
194
55
0
15 Sep 2021
How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on Isotropy
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
S. Rajaee
Mohammad Taher Pilehvar
236
26
0
10 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
International Conference on Learning Representations (ICLR), 2021
Ofir Press
Noah A. Smith
M. Lewis
632
980
0
27 Aug 2021
1
2
3
4
5
Next