ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.01068
  4. Cited By
OPT: Open Pre-trained Transformer Language Models
v1v2v3v4 (latest)

OPT: Open Pre-trained Transformer Language Models

2 May 2022
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
Shuohui Chen
Christopher Dewan
Mona T. Diab
Xian Li
Xi Lin
Todor Mihaylov
Myle Ott
Sam Shleifer
Kurt Shuster
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
    VLMOSLMAI4CE
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "OPT: Open Pre-trained Transformer Language Models"

50 / 2,924 papers shown
MERGE: Minimal Expression-Replacement GEneralization Test for Natural Language Inference
MERGE: Minimal Expression-Replacement GEneralization Test for Natural Language Inference
Mădălina Zgreabăn
Tejaswini Deoskar
Lasha Abzianidze
123
0
0
28 Oct 2025
CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM-Powered Agents
CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM-Powered Agents
Zesen Liu
Z. Zhang
Yuchong Xie
Dongdong She
AAML
303
0
0
27 Oct 2025
MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs
MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs
Yucheng Ning
Xixun Lin
Fang Fang
Yanan Cao
HILM
321
0
0
27 Oct 2025
Learning "Partner-Aware" Collaborators in Multi-Party Collaboration
Learning "Partner-Aware" Collaborators in Multi-Party Collaboration
Abhijnan Nath
Nikhil Krishnaswamy
135
0
0
26 Oct 2025
Label Smoothing Improves Gradient Ascent in LLM Unlearning
Label Smoothing Improves Gradient Ascent in LLM Unlearning
Zirui Pang
Hao Zheng
Zhijie Deng
Ling Li
Zixin Zhong
Jiaheng Wei
MU
192
1
0
25 Oct 2025
LLM-Generated Negative News Headlines Dataset: Creation and Benchmarking Against Real Journalism
LLM-Generated Negative News Headlines Dataset: Creation and Benchmarking Against Real Journalism
Olusola Babalola
Bolanle Ojokoh
Olutayo Boyinbode
SyDa
137
0
0
24 Oct 2025
Efficient semantic uncertainty quantification in language models via diversity-steered sampling
Efficient semantic uncertainty quantification in language models via diversity-steered sampling
Ji Won Park
K. Cho
134
0
0
24 Oct 2025
Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach
Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach
Dandan Liang
Jianing Zhang
Evan Chen
Zhe Li
Rui Li
Haibo Yang
FedML
195
1
0
24 Oct 2025
Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers
Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal TransformersIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025
Dean L. Slack
G. Hudson
T. Winterbottom
Noura Al Moubayed
146
0
0
23 Oct 2025
Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction
Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction
Suchir Salhan
Hongyi gu
Donya Rooein
Diana Galván-Sosa
Gabrielle Gaudeau
Andrew Caines
Zheng Yuan
P. Buttery
106
1
0
23 Oct 2025
Relative-Based Scaling Law for Neural Language Models
Relative-Based Scaling Law for Neural Language Models
Baoqing Yue
Jinyuan Zhou
Zixi Wei
Jingtao Zhan
Qingyao Ai
Yiqun Liu
147
0
0
23 Oct 2025
Capability Ceilings in Autoregressive Language Models: Empirical Evidence from Knowledge-Intensive Tasks
Capability Ceilings in Autoregressive Language Models: Empirical Evidence from Knowledge-Intensive Tasks
Javier Marín
103
0
0
23 Oct 2025
On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization
On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization
Shaocong Ma
Heng Huang
140
2
0
22 Oct 2025
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned PerturbationsInternational Conference on Learning Representations (ICLR), 2025
Shaocong Ma
Heng Huang
160
12
0
22 Oct 2025
Energy-Efficient and Dequantization-Free Q-LLMs: A Spiking Neural Network Approach to Salient Value Mitigation
Energy-Efficient and Dequantization-Free Q-LLMs: A Spiking Neural Network Approach to Salient Value Mitigation
Chenyu Wang
Zhanglu Yan
Zhi Zhou
Xu Chen
Weng-Fai Wong
MQ
171
0
0
22 Oct 2025
What is the Best Sequence Length for BABYLM?
What is the Best Sequence Length for BABYLM?
Suchir Salhan
Richard Diehl Martinez
Zébulon Goriely
P. Buttery
108
2
0
22 Oct 2025
Learning Human-Object Interaction as Groups
Learning Human-Object Interaction as Groups
Jiajun Hong
Jianan Wei
Wenguan Wang
152
0
0
21 Oct 2025
BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining
BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining
Ajinkya Khoche
Gergő László Nagy
Maciej K. Wozniak
Thomas Gustafsson
Patric Jensfelt
152
0
0
21 Oct 2025
Towards Fast LLM Fine-tuning through Zeroth-Order Optimization with Projected Gradient-Aligned Perturbations
Towards Fast LLM Fine-tuning through Zeroth-Order Optimization with Projected Gradient-Aligned Perturbations
Zhendong Mi
Qitao Tan
Grace Li Zhang
Zhaozhuo Xu
Geng Yuan
Shaoyi Huang
152
0
0
21 Oct 2025
DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning
DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning
Yongxin He
Shan Zhang
Yixuan Cao
Lei Ma
Ping Luo
DeLMO
247
1
0
20 Oct 2025
All You Need is One: Capsule Prompt Tuning with a Single Vector
All You Need is One: Capsule Prompt Tuning with a Single Vector
Yiyang Liu
James Chenhao Liang
Heng Fan
Wenhao Yang
Yiming Cui
Xiaotian Han
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
VLM
146
3
0
19 Oct 2025
Graph4MM: Weaving Multimodal Learning with Structural Information
Graph4MM: Weaving Multimodal Learning with Structural Information
Xuying Ning
Dongqi Fu
Tianxin Wei
Wujiang Xu
Jingrui He
132
5
0
19 Oct 2025
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba
Kunyu Peng
Di Wen
Jia Fu
Jiamin Wu
Kailun Yang
...
Yufan Chen
Yuqian Fu
D. Paudel
Luc Van Gool
Rainer Stiefelhagen
145
0
0
18 Oct 2025
TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs
TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs
Sibo Xiao
Jinyuan Fu
Zhongle Xie
Lidan Shou
AI4TS
186
0
0
17 Oct 2025
Zeroth-Order Sharpness-Aware Learning with Exponential Tilting
Zeroth-Order Sharpness-Aware Learning with Exponential Tilting
Xuchen Gong
Tian Li
148
0
0
17 Oct 2025
DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing
DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing
Ting Qiao
Xing Liu
Wenke Huang
Jianbin Li
Zhaoxin Fan
Yiming Li
AAML
150
1
0
17 Oct 2025
A Free Lunch in LLM Compression: Revisiting Retraining after Pruning
A Free Lunch in LLM Compression: Revisiting Retraining after Pruning
Moritz Wagner
Christophe Roux
Max Zimmer
Sebastian Pokutta
78
0
0
16 Oct 2025
MaskCaptioner: Learning to Jointly Segment and Caption Object Trajectories in Videos
MaskCaptioner: Learning to Jointly Segment and Caption Object Trajectories in Videos
Gabriel Fiastre
Antoine Yang
Cordelia Schmid
VOS
454
1
0
16 Oct 2025
CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection
CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection
Hojun Choi
Youngsun Lim
Jaeyo Shin
Hyunjung Shim
ObjDLRMVLM
383
1
0
16 Oct 2025
MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving
MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving
Jungi Lee
Junyong Park
Soohyun Cha
Jaehoon Cho
Jaewoong Sim
119
2
0
16 Oct 2025
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Nikhil Bhendawade
K. Nishu
Arnav Kundu
Chris Bartels
Minsik Cho
Irina Belousova
LRM
336
0
0
15 Oct 2025
Towards Reversible Model Merging For Low-rank Weights
Towards Reversible Model Merging For Low-rank Weights
Mohammadsajad Alipour
Mohammad Mohammadi Amiri
MoMe
157
0
0
15 Oct 2025
Bolster Hallucination Detection via Prompt-Guided Data Augmentation
Bolster Hallucination Detection via Prompt-Guided Data Augmentation
Wenyun Li
Zheng Zhang
Dongmei Jiang
Xiangyuan Lan
HILM
187
0
0
13 Oct 2025
Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent
Softmax ≥\geq≥ Linear: Transformers may learn to classify in-context by kernel gradient descent
Sara Dragutinovic
Andrew Saxe
Aaditya K. Singh
MLT
147
1
0
12 Oct 2025
Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity
Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy SparsityInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024
Tuowei Wang
Kun Li
Zixu Hao
Donglin Bai
Ju Ren
Yaoxue Zhang
Ting Cao
M. Yang
166
4
0
12 Oct 2025
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
Shaobo Wang
C. Wang
Wenjie Fu
Yue Min
Mingquan Feng
...
Kexin Yang
Xingzhang Ren
Fei Huang
Dayiheng Liu
Linfeng Zhang
156
0
0
12 Oct 2025
PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models
PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models
Lancheng Zou
Shuo Yin
Zehua Pei
Tsung-Yi Ho
Farzan Farnia
Bei Yu
88
0
0
11 Oct 2025
On the Provable Performance Guarantee of Efficient Reasoning Models
On the Provable Performance Guarantee of Efficient Reasoning Models
Hao Zeng
Jianguo Huang
Bingyi Jing
Hongxin Wei
Bo An
LRM
129
1
0
10 Oct 2025
FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference
FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference
Yu-Chen Lu
Chong-Yan Chen
Chi-Chih Chang
Yu-Fang Hu
Kai-Chiang Wu
84
1
0
10 Oct 2025
Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
Rui Bu
Haofeng Zhong
Wenzheng Chen
Yangyan Li
141
0
0
10 Oct 2025
Cocoon: A System Architecture for Differentially Private Training with Correlated Noises
Cocoon: A System Architecture for Differentially Private Training with Correlated Noises
Donghwan Kim
Xin Gu
Jinho Baek
Timothy Lo
Younghoon Min
Kwangsik Shin
Jongryool Kim
J. Park
Kiwan Maeng
143
0
0
08 Oct 2025
AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
Boyi Zeng
Lin Chen
Ziwei He
Xinbing Wang
Zhouhan Lin
121
0
0
08 Oct 2025
Adaptive Stain Normalization for Cross-Domain Medical Histology
Adaptive Stain Normalization for Cross-Domain Medical HistologyInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Tianyue Xu
Yanlin Wu
Abhai K. Tripathi
Matthew M. Ippolito
Benjamin D. Haeffele
OODMedIm
148
0
0
08 Oct 2025
Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography
Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography
Jiuan Zhou
Yu Cheng
Yuan Xie
Z. Yin
127
4
0
08 Oct 2025
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Soyeong Jeong
Taehee Jung
Sung Ju Hwang
Joo-Kyung Kim
Luan Tuyen Chau
LLMAGLRM
125
0
0
08 Oct 2025
Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes
Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes
Peter Ochieng
94
1
0
07 Oct 2025
Staircase Streaming for Low-Latency Multi-Agent Inference
Staircase Streaming for Low-Latency Multi-Agent Inference
Junlin Wang
Jue Wang
Zhen
Ben Athiwaratkun
Bhuwan Dhingra
Ce Zhang
James Y. Zou
186
0
0
06 Oct 2025
Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving
Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving
Yue Pan
Zihan Xia
Po-Kai Hsu
Lanxiang Hu
Hyungyo Kim
...
Minxuan Zhou
Nam Sung Kim
Shimeng Yu
Tajana Rosing
Mingu Kang
MoE
116
3
0
06 Oct 2025
LongTail-Swap: benchmarking language models' abilities on rare words
LongTail-Swap: benchmarking language models' abilities on rare words
Robin Algayres
Charles-Éric Saint-James
Mahi Luthra
Jiayi Shen
Dongyan Lin
Youssef Benchekroun
Rashel Moritz
Juan Pino
Emmanuel Dupoux
115
0
0
05 Oct 2025
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models
Minseo Kim
Coleman Hooper
Aditya Tomar
Chenfeng Xu
Mehrdad Farajtabar
Michael W. Mahoney
Kurt Keutzer
Amir Gholami
190
2
0
05 Oct 2025
Previous
12345...575859
Next
Page 2 of 59
Pageof 59