v1v2v3v4 (latest)

OPT: Open Pre-trained Transformer Language Models

2 May 2022

Xian Li

Luke Zettlemoyer

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "OPT: Open Pre-trained Transformer Language Models"

50 / 2,924 papers shown

Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection

318

28 May 2025

Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning

193

28 May 2025

Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits

Yeshwanth Venkatesha

Souvik Kundu

Priyadarshini Panda

166

27 May 2025

Test-Time Learning for Large Language Models

440

27 May 2025

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

547

27 May 2025

Pretraining Language Models to Ponder in Continuous Space

367

27 May 2025

MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

385

27 May 2025

ResSVD: Residual Compensated SVD for Large Language Model Compression

343

26 May 2025

FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

284

26 May 2025

Frictional Agent Alignment Framework: Slow Down and Don't Break ThingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

332

26 May 2025

Towards Harmonized Uncertainty Estimation for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

253

25 May 2025

eACGM: Non-instrumented Performance Tracing and Anomaly Detection towards Machine Learning SystemsInternational Workshop on Quality of Service (IWQoS), 2025

Ruilin Xu

Zongxuan Xie

Pengfei Chen

25 May 2025

Rethinking the Understanding Ability across LLMs through Mutual Information

Shaojie Wang

Sirui Ding

Na Zou

353

25 May 2025

Sci-LoRA: Mixture of Scientific LoRAs for Cross-Domain Lay ParaphrasingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

206

24 May 2025

KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning

362

24 May 2025

μ

-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

230

24 May 2025

Understanding Gated Neurons in Transformers from Their Input-Output Functionality

Sebastian Gerstner

Hinrich Schütze

MILM FAtt

381

23 May 2025

Scaling Recurrent Neural Networks to a Billion Parameters with Zero-Order Optimization

Francois Chaubard

Mykel J. Kochenderfer

MQ AI4CE

398

23 May 2025

PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval

182

23 May 2025

LatentLLM: Attention-Aware Joint Tensor Compression

233

23 May 2025

Two-Stage Regularization-Based Structured Pruning for LLMs

377

23 May 2025

SELF: Self-Extend the Context Length With Logistic Growth Function

272

22 May 2025

Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting

Bang Trinh Tran To

Thai Le

MU KELM

189

22 May 2025

LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead

232

22 May 2025

TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven Pruning

Florentin Beck

William Rudman

Carsten Eickhoff

379

22 May 2025

NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics

395

22 May 2025

AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training

502

22 May 2025

Incremental Sequence Classification with Temporal Consistency

Alvaro Ortega Gonzalez

Andrei Nica

David Barber

CLL

287

22 May 2025

SUS backprop: linear backpropagation algorithm for long inputs in transformers

Sergey Pankov

Georges Harik

346

21 May 2025

Establishing a Scale for Kullback--Leibler Divergence in Language Models Across Various Settings

341

21 May 2025

EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product AssociationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

409

21 May 2025

Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack

219

21 May 2025

Vision-Language Modeling Meets Remote Sensing: Models, Datasets and PerspectivesIEEE Geoscience and Remote Sensing Magazine (GRSM), 2025

394

20 May 2025

Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability HypothesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Hong Huang

Dapeng Wu

414

20 May 2025

Domain Gating Ensemble Networks for AI-Generated Text Detection

211

20 May 2025

Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

356

19 May 2025

TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning

330

19 May 2025

Know3-RAG: A Knowledge-aware RAG Framework with Adaptive Retrieval, Generation, and Filtering

350

19 May 2025

Vectors from Larger Language Models Predict Human Reading Time and fMRI Data More Poorly when Dimensionality Expansion is Controlled

Yi-Chien Lin

Hongao Zhu

William Schuler

207

18 May 2025

Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Chenlu Wang

Weimin Lyu

Ritwik Banerjee

219

17 May 2025

Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform

Josh Alman

Zhao Song

361

17 May 2025

The Ripple Effect: On Unforeseen Complications of Backdoor Attacks

238

16 May 2025

From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

342

15 May 2025

Superposition Yields Robust Neural Scaling

660

15 May 2025

MorphMark: Flexible Adaptive Watermarking for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

369

14 May 2025

Resource-Efficient Language Models: Quantization for Fast and Accessible Inference

Tollef Emil Jørgensen

309

13 May 2025

Detecting Prefix Bias in LLM-based Reward ModelsConference on Fairness, Accountability and Transparency (FAccT), 2025

Imanol Arrieta-Ibarra

270

13 May 2025

Comet: Accelerating Private Inference for Large Language Model by Predicting Activation SparsityIEEE Symposium on Security and Privacy (S&P), 2025

306

12 May 2025

Whitened CLIP as a Likelihood Surrogate of Images and Captions

Roy Betser

Meir Yossef Levi

Guy Gilboa

268

11 May 2025

Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference

Haolin Zhang

Jeff Huang

228

09 May 2025