Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2205.01068
Cited By
v1
v2
v3
v4 (latest)
OPT: Open Pre-trained Transformer Language Models
2 May 2022
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
Shuohui Chen
Christopher Dewan
Mona T. Diab
Xian Li
Xi Lin
Todor Mihaylov
Myle Ott
Sam Shleifer
Kurt Shuster
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"OPT: Open Pre-trained Transformer Language Models"
50 / 2,924 papers shown
Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection
Z. Wu
Xinze Li
Zhenghao Liu
Shi Yu
Zhiyuan Liu
Minghe Yu
Cheng Yang
Yu Gu
Ge Yu
Maosong Sun
LRM
318
0
0
28 May 2025
Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning
Yongkang Liu
Xingle Xu
Ercong Nie
Zijing Wang
Shi Feng
Daling Wang
Qian Li
Hinrich Schutze
193
1
0
28 May 2025
Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits
Yeshwanth Venkatesha
Souvik Kundu
Priyadarshini Panda
166
7
0
27 May 2025
Test-Time Learning for Large Language Models
Jinwu Hu
Zhitian Zhang
Guohao Chen
Xutao Wen
Chao Shuai
Wei Luo
Bin Xiao
Yuanqing Li
Zhuliang Yu
440
13
0
27 May 2025
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Tianyu Fu
Yi Ge
Yichen You
Enshu Liu
Zhihang Yuan
Guohao Dai
Shengen Yan
Huazhong Yang
Yu Wang
MoE
LRM
547
10
0
27 May 2025
Pretraining Language Models to Ponder in Continuous Space
Boyi Zeng
Shixiang Song
Siyuan Huang
Yixuan Wang
He Li
Ziwei He
Xinbing Wang
Zhiyu Li
Zhouhan Lin
LRM
367
12
0
27 May 2025
MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning
International Joint Conference on Artificial Intelligence (IJCAI), 2025
Zikang Guo
Benfeng Xu
Xiaorui Wang
Zhendong Mao
385
2
0
27 May 2025
ResSVD: Residual Compensated SVD for Large Language Model Compression
Haolei Bai
Siyong Jian
Tuo Liang
Yu Yin
Huan Wang
343
3
0
26 May 2025
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models
Hao Kang
Zichun Yu
Chenyan Xiong
MoE
284
2
0
26 May 2025
Frictional Agent Alignment Framework: Slow Down and Don't Break Things
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Abhijnan Nath
Carine Graff
Andrei Bachinin
Nikhil Krishnaswamy
320
4
0
26 May 2025
Towards Harmonized Uncertainty Estimation for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Rui Li
Jing Long
Muge Qi
Heming Xia
Lei Sha
Peiyi Wang
Zhifang Sui
UQCV
253
0
0
25 May 2025
eACGM: Non-instrumented Performance Tracing and Anomaly Detection towards Machine Learning Systems
International Workshop on Quality of Service (IWQoS), 2025
Ruilin Xu
Zongxuan Xie
Pengfei Chen
54
0
0
25 May 2025
Rethinking the Understanding Ability across LLMs through Mutual Information
Shaojie Wang
Sirui Ding
Na Zou
353
1
0
25 May 2025
Sci-LoRA: Mixture of Scientific LoRAs for Cross-Domain Lay Paraphrasing
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Ming Cheng
Jiaying Gong
Hoda Eldardiry
AI4CE
206
1
0
24 May 2025
KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning
Zhendong Mi
Qitao Tan
Xiaodong Yu
Zining Zhu
Geng Yuan
Shaoyi Huang
362
4
0
24 May 2025
μ
μ
μ
-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
T. Koike-Akino
Jing Liu
Ye Wang
MoE
230
0
0
24 May 2025
Understanding Gated Neurons in Transformers from Their Input-Output Functionality
Sebastian Gerstner
Hinrich Schütze
MILM
FAtt
381
0
0
23 May 2025
Scaling Recurrent Neural Networks to a Billion Parameters with Zero-Order Optimization
Francois Chaubard
Mykel J. Kochenderfer
MQ
AI4CE
395
3
0
23 May 2025
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Zehua Pei
Ying Zhang
Hui-Ling Zhen
Xianzhi Yu
Wulong Liu
Sinno Jialin Pan
Mingxuan Yuan
Bei Yu
MoE
182
0
0
23 May 2025
LatentLLM: Attention-Aware Joint Tensor Compression
T. Koike-Akino
Xiangyu Chen
Jing Liu
Ye Wang
Wang
Matthew Brand
233
3
0
23 May 2025
Two-Stage Regularization-Based Structured Pruning for LLMs
Mingkuan Feng
Jinyang Wu
Siyuan Liu
Shuai Zhang
Hongjian Fang
Ruihan Jin
Feihu Che
Pengpeng Shao
Zhengqi Wen
377
0
0
23 May 2025
SELF: Self-Extend the Context Length With Logistic Growth Function
Phat Thanh Dang
Saahil Thoppay
Wang Yang
Qifan Wang
Vipin Chaudhary
Xiaotian Han
272
0
0
22 May 2025
Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting
Bang Trinh Tran To
Thai Le
MU
KELM
189
4
0
22 May 2025
LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead
Yifan Zhang
Xinkui Zhao
Zuxin Wang
Guanjie Cheng
Yueshen Xu
Shuiguang Deng
Yuxiang Cai
232
3
0
22 May 2025
TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven Pruning
Florentin Beck
William Rudman
Carsten Eickhoff
379
1
0
22 May 2025
NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics
Zhihang Cai
Xingjun Zhang
Zhendong Tan
Zheng Wei
MQ
395
3
0
22 May 2025
AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training
Huishuai Zhang
Bohan Wang
Luoxin Chen
ODL
502
2
0
22 May 2025
Incremental Sequence Classification with Temporal Consistency
Lucas Maystre
Gabriel Barello
Tudor Berariu
Aleix Cambray
Rares Dolga
Alvaro Ortega Gonzalez
Andrei Nica
David Barber
CLL
287
0
0
22 May 2025
SUS backprop: linear backpropagation algorithm for long inputs in transformers
Sergey Pankov
Georges Harik
346
0
0
21 May 2025
Establishing a Scale for Kullback--Leibler Divergence in Language Models Across Various Settings
Ryo Kishino
Yusuke Takase
Momose Oyama
Hiroaki Yamagiwa
Hidetoshi Shimodaira
341
0
0
21 May 2025
EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Weiqi Wang
Limeng Cui
Xin Liu
Jiapeng Liu
Wenju Xu
...
Y. Gao
Haiyang Zhang
Qi He
Shuiwang Ji
Yangqiu Song
409
10
0
21 May 2025
Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack
Silvia Cappelletti
Tobia Poppi
Samuele Poppi
Zheng-Xin Yong
Diego Garcia-Olano
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
KELM
AAML
219
0
0
21 May 2025
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives
IEEE Geoscience and Remote Sensing Magazine (GRSM), 2025
Xingxing Weng
Chao Pang
Gui-Song Xia
VLM
394
12
0
20 May 2025
Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hong Huang
Dapeng Wu
414
5
0
20 May 2025
Domain Gating Ensemble Networks for AI-Generated Text Detection
Arihant Tripathi
Liam Dugan
Charis Gao
Maggie Huan
Emma Jin
Peter Zhang
David Zhang
Julia Zhao
Chris Callison-Burch
VLM
211
0
0
20 May 2025
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
Sifeng Shang
Jiayi Zhou
Chenyu Lin
Minxian Li
Kaiyang Zhou
MQ
356
1
0
19 May 2025
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
Lihong Chen
Hossein Hassani
Soodeh Nikan
VLM
330
4
0
19 May 2025
Know3-RAG: A Knowledge-aware RAG Framework with Adaptive Retrieval, Generation, and Filtering
Xukai Liu
Ye Liu
Shiwen Wu
Yanghai Zhang
Yihao Yuan
Kai Zhang
Qi Liu
350
0
0
19 May 2025
Vectors from Larger Language Models Predict Human Reading Time and fMRI Data More Poorly when Dimensionality Expansion is Controlled
Yi-Chien Lin
Hongao Zhu
William Schuler
207
3
0
18 May 2025
Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding Tasks
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Chenlu Wang
Weimin Lyu
Ritwik Banerjee
219
0
0
17 May 2025
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
361
23
0
17 May 2025
The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
Rui Zhang
Yun Shen
Hongwei Li
Wenbo Jiang
Hanxiao Chen
Yuan Zhang
Guowen Xu
Yang Zhang
SILM
AAML
238
0
0
16 May 2025
From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yidan Wang
Yubing Ren
Yanan Cao
Binxing Fang
342
2
0
15 May 2025
Superposition Yields Robust Neural Scaling
Yizhou Liu
Ziming Liu
Jeff Gore
MILM
660
4
0
15 May 2025
MorphMark: Flexible Adaptive Watermarking for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Zongqi Wang
Tianle Gu
Baoyuan Wu
Yujiu Yang
WaLM
369
9
0
14 May 2025
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
309
0
0
13 May 2025
Detecting Prefix Bias in LLM-based Reward Models
Conference on Fairness, Accountability and Transparency (FAccT), 2025
Ashwin Kumar
Yuzi He
Aram H. Markosyan
Bobbie Chern
Imanol Arrieta-Ibarra
267
6
0
13 May 2025
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity
IEEE Symposium on Security and Privacy (S&P), 2025
Guang Yan
Yuhui Zhang
Zimu Guo
Lutan Zhao
Xiaojun Chen
Chen Wang
Wenhao Wang
Dan Meng
Rui Hou
306
2
0
12 May 2025
Whitened CLIP as a Likelihood Surrogate of Images and Captions
Roy Betser
Meir Yossef Levi
Guy Gilboa
268
3
0
11 May 2025
Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference
Haolin Zhang
Jeff Huang
228
3
0
09 May 2025
Previous
1
2
3
...
5
6
7
...
57
58
59
Next
Page 6 of 59
Page
of 59
Go