v1v2v3 (latest)

Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

4 November 2016

Papers citing "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling"

50 / 237 papers shown

SURFing to the Fundamental Limit of Jet Tagging

115

19 Nov 2025

Inverse Language Modeling towards Robust and Grounded LLMs

Davide Gabrielli

Simone Sestito

Iacopo Masi

135

02 Oct 2025

Semantic Fusion with Fuzzy-Membership Features for Controllable Language Modelling

Yongchao Huang

Hassan Raza

111

14 Sep 2025

Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training

Woojin Chung

Jeonghoon Kim

201

21 Aug 2025

PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection

Junseo Hwang

Wonguk Cho

Taesup Kim

274

26 May 2025

Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity

M. Doumbouya

Dan Jurafsky

Christopher D. Manning

FedML

253

21 May 2025

Merging Feed-Forward Sublayers for Compressed Transformers

377

10 Jan 2025

Masked Generative Priors Improve World Models Sequence Modelling Capabilities

884

10 Oct 2024

DEPT: Decoupled Embeddings for Pre-training Language ModelsInternational Conference on Learning Representations (ICLR), 2024

William F. Shen

Dongqi Cai

Nicholas D. Lane

1.4K

07 Oct 2024

Stable Language Model Pre-training by Reducing Embedding VariabilityConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

James Thorne

188

12 Sep 2024

What makes math problems hard for reinforcement learning: a case study

Lucas Fagan

Piotr Kucharski

Zhenghan Wang

Sergei Gukov

163

27 Aug 2024

Kraken: Inherently Parallel Transformers For Efficient Multi-Device InferenceNeural Information Processing Systems (NeurIPS), 2024

R. Prabhakar

Hengrui Zhang

D. Wentzlaff

294

14 Aug 2024

Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

Awni Altabaa

John Lafferty

325

26 May 2024

Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Sander Land

Max Bartolo

291

08 May 2024

Language models scale reliably with over-training and on downstream tasksInternational Conference on Learning Representations (ICLR), 2024

...

Niklas Muennighoff

351

13 Mar 2024

Spike No More: Stabilizing the Pre-training of Large Language Models

427

28 Dec 2023

Dotless Representation of Arabic Text: Analysis and Modeling

Maged S. Al-Shaibani

Irfan Ahmad

173

26 Dec 2023

Balanced and Deterministic Weight-sharing Helps Network Performance

Oscar Chang

Hod Lipson

123

13 Dec 2023

The mechanistic basis of data dependence and abrupt learning in an in-context classification taskInternational Conference on Learning Representations (ICLR), 2023

Gautam Reddy

314

03 Dec 2023

Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying

Adithya Renduchintala

Tugrul Konuk

Oleksii Kuchaiev

MoMe

346

16 Nov 2023

Longer Fixations, More Computation: Gaze-Guided Recurrent Neural Networks

235

31 Oct 2023

Neural Bradley-Terry Rating: Quantifying Properties from ComparisonsInternational Conference on Agents and Artificial Intelligence (ICAART), 2023

Satoru Fujii

268

24 Jul 2023

Exploring Representational Disparities Between Multilingual and Bilingual Translation ModelsInternational Conference on Language Resources and Evaluation (LREC), 2023

Neha Verma

Kenton W. Murray

Kevin Duh

232

23 May 2023

When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model ScaleNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

397

23 May 2023

Extending Memory for Language Modelling

A. Nugaliyadde

KELM CLL VLM

156

19 May 2023

Tensor Decomposition for Model Reduction in Neural Networks: A ReviewIEEE Circuits and Systems Magazine (IEEE CAS Magazine), 2023

Xingyi Liu

Keshab K. Parhi

188

26 Apr 2023

SPEC: Summary Preference Decomposition for Low-Resource Abstractive SummarizationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Yi-Syuan Chen

Yun-Zhu Song

Hong-Han Shuai

136

24 Mar 2023

Coordinating Distributed Example Orders for Provably Accelerated TrainingNeural Information Processing Systems (NeurIPS), 2023

528

02 Feb 2023

ConsistTL: Modeling Consistency in Transfer Learning for Low-Resource Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Min Zhang

189

08 Dec 2022

Generative Adversarial Training Can Improve Neural Language Models

Sajad Movahedi

A. Shakery

GAN AI4CE

158

02 Nov 2022

Bilingual Synchronization: Restoring Translational Relationships with Editing OperationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Jitao Xu

Josep Crego

François Yvon

143

24 Oct 2022

FLCert: Provably Secure Federated Learning against Poisoning AttacksIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2022

338

02 Oct 2022

Generalization in Neural Networks: A Broad SurveyNeurocomputing (Neurocomputing), 2022

Chris Rohlfs

OOD AI4CE

281

04 Sep 2022

ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention networkIEEE Access (IEEE Access), 2022

Nikolaos Gkalelis

Dimitrios Daskalakis

Vasileios Mezaris

204

20 Jul 2022

Efficient recurrent architectures through activity sparsity and sparse back-propagation through timeInternational Conference on Learning Representations (ICLR), 2022

Anand Subramoney

Khaleelulla Khan Nazeer

Mark Schöne

Christian Mayr

David Kappel

336

13 Jun 2022

Multilingual Machine Translation with Hyper-AdaptersConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

266

22 May 2022

Twist Decoding: Diverse Generators Guide Each OtherConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Keisuke Sakaguchi

Hao Peng

Yejin Choi

137

19 May 2022

Joint Generation of Captions and Subtitles with Dual DecodingInternational Workshop on Spoken Language Translation (IWSLT), 2022

147

13 May 2022

Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo LanguagesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Kwangyoun Kim

222

02 May 2022

Linearizing Transformer with Key-Value MemoryConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Yizhe Zhang

Deng Cai

274

23 Mar 2022

Relational Memory Augmented Language ModelsTransactions of the Association for Computational Linguistics (TACL), 2022

Qi Liu

Dani Yogatama

Phil Blunsom

KELM RALM

261

24 Jan 2022

Automatic Sparse Connectivity Learning for Neural NetworksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022

241

13 Jan 2022

Frequency-Aware Contrastive Learning for Neural Machine TranslationAAAI Conference on Artificial Intelligence (AAAI), 2021

Tong Zhang

Wei Ye

Baosong Yang

Long Zhang

Xingzhang Ren

Dayiheng Liu

177

29 Dec 2021

The Importance of the Current Input in Sequence ModelingArtificial Intelligence Applications and Innovations (AIAI), 2021

Christian Oliva

Luis F. Lago-Fernández

3DV

104

22 Dec 2021

Hybrid Random FeaturesInternational Conference on Learning Representations (ICLR), 2021

Haoxian Chen

...

Valerii Likhosherstov

Dmitry Kalashnikov

Vikas Sindhwani

Adrian Weller

166

08 Oct 2021

One Source, Two Targets: Challenges and Rewards of Dual Decoding

Jitao Xu

François Yvon

206

21 Sep 2021

InvBERT: Reconstructing Text from Contextualized Word Embeddings by inverting the BERT pipeline

170

21 Sep 2021

Tied & Reduced RNN-T Decoder

221

15 Sep 2021

How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on IsotropyConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

S. Rajaee

Mohammad Taher Pilehvar

248

10 Sep 2021

Train Short, Test Long: Attention with Linear Biases Enables Input Length ExtrapolationInternational Conference on Learning Representations (ICLR), 2021

Ofir Press

Noah A. Smith

M. Lewis

841

1,019

27 Aug 2021