v1v2v3v4 (latest)

OPT: Open Pre-trained Transformer Language Models

2 May 2022

Xian Li

Luke Zettlemoyer

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "OPT: Open Pre-trained Transformer Language Models"

50 / 2,924 papers shown

MERGE: Minimal Expression-Replacement GEneralization Test for Natural Language Inference

Mădălina Zgreabăn

Tejaswini Deoskar

Lasha Abzianidze

123

28 Oct 2025

CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM-Powered Agents

304

27 Oct 2025

MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs

324

27 Oct 2025

Learning "Partner-Aware" Collaborators in Multi-Party Collaboration

Abhijnan Nath

Nikhil Krishnaswamy

135

26 Oct 2025

Label Smoothing Improves Gradient Ascent in LLM Unlearning

192

25 Oct 2025

LLM-Generated Negative News Headlines Dataset: Creation and Benchmarking Against Real Journalism

138

24 Oct 2025

Efficient semantic uncertainty quantification in language models via diversity-steered sampling

Ji Won Park

K. Cho

134

24 Oct 2025

Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach

195

24 Oct 2025

Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal TransformersIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025

146

23 Oct 2025

Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction

106

23 Oct 2025

Relative-Based Scaling Law for Neural Language Models

147

23 Oct 2025

Capability Ceilings in Autoregressive Language Models: Empirical Evidence from Knowledge-Intensive Tasks

Javier Marín

103

23 Oct 2025

On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization

Shaocong Ma

Heng Huang

141

22 Oct 2025

Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned PerturbationsInternational Conference on Learning Representations (ICLR), 2025

Shaocong Ma

Heng Huang

160

22 Oct 2025

Energy-Efficient and Dequantization-Free Q-LLMs: A Spiking Neural Network Approach to Salient Value Mitigation

174

22 Oct 2025

What is the Best Sequence Length for BABYLM?

Suchir Salhan

Richard Diehl Martinez

Zébulon Goriely

P. Buttery

108

22 Oct 2025

Learning Human-Object Interaction as Groups

Jiajun Hong

Jianan Wei

Wenguan Wang

152

21 Oct 2025

BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining

153

21 Oct 2025

Towards Fast LLM Fine-tuning through Zeroth-Order Optimization with Projected Gradient-Aligned Perturbations

152

21 Oct 2025

DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning

247

20 Oct 2025

All You Need is One: Capsule Prompt Tuning with a Single Vector

147

19 Oct 2025

Graph4MM: Weaving Multimodal Learning with Structural Information

132

19 Oct 2025

RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba

...

145

18 Oct 2025

TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs

186

17 Oct 2025

Zeroth-Order Sharpness-Aware Learning with Exponential Tilting

Xuchen Gong

Tian Li

148

17 Oct 2025

DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

150

17 Oct 2025

A Free Lunch in LLM Compression: Revisiting Retraining after Pruning

16 Oct 2025

MaskCaptioner: Learning to Jointly Segment and Caption Object Trajectories in Videos

458

16 Oct 2025

CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection

386

16 Oct 2025

MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving

119

16 Oct 2025

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

336

15 Oct 2025

Towards Reversible Model Merging For Low-rank Weights

Mohammadsajad Alipour

Mohammad Mohammadi Amiri

MoMe

160

15 Oct 2025

Bolster Hallucination Detection via Prompt-Guided Data Augmentation

189

13 Oct 2025

$Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent$

Softmax

\geq

Linear: Transformers may learn to classify in-context by kernel gradient descent

147

12 Oct 2025

Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy SparsityInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024

173

12 Oct 2025

Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?

...

156

12 Oct 2025

PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models

11 Oct 2025

On the Provable Performance Guarantee of Efficient Reasoning Models

137

10 Oct 2025

FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference

10 Oct 2025

Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers

147

10 Oct 2025

Cocoon: A System Architecture for Differentially Private Training with Correlated Noises

143

08 Oct 2025

AWM: Accurate Weight-Matrix Fingerprint for Large Language Models

126

08 Oct 2025

Adaptive Stain Normalization for Cross-Domain Medical HistologyInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

148

08 Oct 2025

Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography

128

08 Oct 2025

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

127

08 Oct 2025

Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes

Peter Ochieng

07 Oct 2025

Staircase Streaming for Low-Latency Multi-Agent Inference

186

06 Oct 2025

Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving

...

118

06 Oct 2025

LongTail-Swap: benchmarking language models' abilities on rare words

Robin Algayres

Charles-Éric Saint-James

115

05 Oct 2025

Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models

193

05 Oct 2025