Representation Degeneration Problem in Training Natural Language Generation Models

International Conference on Learning Representations (ICLR), 2019

28 July 2019

Xu Tan

Papers citing "Representation Degeneration Problem in Training Natural Language Generation Models"

50 / 161 papers shown

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

Yixuan Tang

Yi Yang

ALM

159

02 Dec 2025

Lifting Manifolds to Mitigate Pseudo-Alignment in LLM4TS

Liangwei Nathan Zheng

112

14 Oct 2025

Scaling Language-Centric Omnimodal Representation Learning

143

13 Oct 2025

Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional AttentionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

151

02 Oct 2025

Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval

119

30 Sep 2025

Demystifying Network Foundation Models

194

27 Sep 2025

Binary Autoencoder for Mechanistic Interpretability of Large Language Models

200

25 Sep 2025

Probability Signature: Bridging Data Semantics and Embedding Structure in Language Models

Junjie Yao

Zhi-hai Xu

139

24 Sep 2025

Angular Dispersion Accelerates

k

-Nearest Neighbors Machine Translation

Evgeniia Tokarchuk

S. Troshin

Vlad Niculae

105

20 Sep 2025

Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation

Kelin Ren

Chan-Yang Ju

Dong-Ho Lee

112

11 Sep 2025

ECG-Soup: Harnessing Multi-Layer Synergy for ECG Foundation Models

P. Nguyen

Huy P Phan

Hieu Pham

Christos Chatzichristos

Bert Vandenberk

M. D. Vos

MedIm

264

27 Aug 2025

Vec2Summ: Text Summarization via Probabilistic Sentence Embeddings

Mao Li

Fred Conrad

Johann Gagnon-Bartsch

09 Aug 2025

From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

343

20 Jul 2025

Large Language Models Encode Semantics and Alignment in Linearly Separable Representations

161

13 Jul 2025

Accurate and Efficient Multivariate Time Series Forecasting via Offline ClusteringIEEE International Conference on Data Engineering (ICDE), 2025

381

09 May 2025

llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length

Issa Sugiura

Kouta Nakayama

Yusuke Oda

182

22 Apr 2025

Measuring Intrinsic Dimension of Token Embeddings

Takuya Kataiwa

Cho Hakaze

Tetsushi Ohki

231

04 Mar 2025

Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations

Yize Zhao

Tina Behnia

V. Vakilian

Christos Thrampoulidis

427

20 Feb 2025

DEUCE: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active LearningTransactions of the Association for Computational Linguistics (TACL), 2024

414

01 Feb 2025

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

...

457

389

18 Dec 2024

USTCCTSU at SemEval-2024 Task 1: Reducing Anisotropy for Cross-lingual Semantic Textual Relatedness TaskInternational Workshop on Semantic Evaluation (SemEval), 2024

346

28 Nov 2024

Long-context Protein Language Modeling Using Bidirectional Mamba with Shared Projection LayersbioRxiv (bioRxiv), 2024

422

29 Oct 2024

Building A Coding Assistant via the Retrieval-Augmented Language Model

Xinze Li

186

21 Oct 2024

Self-Supervised Learning of Disentangled Representations for Multivariate Time-Series

274

16 Oct 2024

How much do contextualized representations encode long-range context?North American Chapter of the Association for Computational Linguistics (NAACL), 2024

Simeng Sun

Cheng-Ping Hsieh

336

16 Oct 2024

Tackling Dimensional Collapse toward Comprehensive Universal Domain Adaptation

Hung-Chieh Fang

Po-Yi Lu

Hsuan-Tien Lin

260

15 Oct 2024

Improving Long-Text Alignment for Text-to-Image Diffusion ModelsInternational Conference on Learning Representations (ICLR), 2024

311

15 Oct 2024

Contrastive Learning for Implicit Social Factors in Social Media Popularity Prediction

Zhizhen Zhang

Ruihong Qiu

Xiaohui Xie

186

12 Oct 2024

CrossQuant: A Post-Training Quantization Method with Smaller Quantization Kernel for Precise Large Language Model Compression

Wenyuan Liu

Xindian Ma

Peng Zhang

Yan Wang

167

10 Oct 2024

Unveiling Transformer Perception by Exploring Input Manifolds

348

08 Oct 2024

NoTeeline: Supporting Real-Time, Personalized Notetaking with LLM-Enhanced MicronotesInternational Conference on Intelligent User Interfaces (IUI), 2024

224

24 Sep 2024

Diversity-grounded Channel Prototypical Learning for Out-of-Distribution Intent Detection

Bo Liu

Liming Zhan

Xiao-Ming Wu

263

17 Sep 2024

Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive LearningACM Multimedia (MM), 2024

Can Gao

Linlin Shen

212

08 Aug 2024

Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models

Ying Zhang

Zhuoran Liu

Manabu Okumura

180

02 Aug 2024

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Chaofan Tao

Qian Liu

Longxu Dou

Niklas Muennighoff

Zhongwei Wan

Ping Luo

Min Lin

Ngai Wong

PILM

319

18 Jul 2024

One Stone, Four Birds: A Comprehensive Solution for QA System Using Supervised Contrastive Learning

Bo Wang

Tsunenori Mine

AAML

268

12 Jul 2024

Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance

Anna C. Marbut

John W. Chandler

Travis J. Wheeler

284

18 Jun 2024

Understanding Token Probability Encoding in Output Embeddings

296

03 Jun 2024

Understanding and Minimising Outlier Features in Neural Network Training

291

29 May 2024

On the Role of Attention Masks and LayerNorm in Transformers

258

29 May 2024

Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs

Mustafa Shukor

Matthieu Cord

300

26 May 2024

DefSent+: Improving sentence embeddings of language models by projecting definition sentences into a quasi-isotropic or isotropic vector space of unlimited dictionary entries

Xiaodong Liu

405

25 May 2024

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Nathan Godey

Eric Villemonte de la Clergerie

Benoît Sagot

198

11 Apr 2024

Understanding Cross-Lingual Alignment -- A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Katharina Hämmerl

Jindvrich Libovický

Kangyang Luo

270

09 Apr 2024

Event-enhanced Retrieval in Real-time Search

Yanan Zhang

Xiaoling Bai

Tianhua Zhou

231

09 Apr 2024

LAN: Learning Adaptive Neighbors for Real-Time Insider Threat DetectionIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024

Xiaojie Yuan

225

14 Mar 2024

Pixel Sentence Representation Learning

Noura Al Moubayed

207

13 Feb 2024

NNOSE: Nearest Neighbor Occupational Skill Extraction

247

30 Jan 2024

Anisotropy Is Inherent to Self-Attention in TransformersConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024

Nathan Godey

Eric Villemonte de la Clergerie

Benoît Sagot

247

22 Jan 2024

Why "classic" Transformers are shallow and how to make them go deep

Yueyao Yu

Yin Zhang

ViT

271

11 Dec 2023