v1v2v3v4 (latest)

OPT: Open Pre-trained Transformer Language Models

2 May 2022

Xian Li

Luke Zettlemoyer

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "OPT: Open Pre-trained Transformer Language Models"

50 / 2,924 papers shown

The Closeness of In-Context Learning and Weight Shifting for Softmax RegressionNeural Information Processing Systems (NeurIPS), 2023

Shuai Li

201

26 Apr 2023

The Internal State of an LLM Knows When It's LyingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

A. Azaria

Tom Michael Mitchell

HILM

649

492

26 Apr 2023

SCM: Enhancing Large Language Model with Self-Controlled Memory Framework

Jian Yang

Zhoujun Li

383

26 Apr 2023

Stable and low-precision training for large-scale vision-language modelsNeural Information Processing Systems (NeurIPS), 2023

Mitchell Wortsman

Tim Dettmers

Luke Zettlemoyer

330

25 Apr 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking HeadAAAI Conference on Artificial Intelligence (AAAI), 2023

Rongjie Huang

Mingze Li

Dongchao Yang

Jiatong Shi

...

Zhou Zhao

252

335

25 Apr 2023

PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques

Mohammed Sabry

Anya Belz

284

24 Apr 2023

Better Question-Answering Models on a Budget

Yudhanjaya Wijeratne

Ishan Marikar

ALM

24 Apr 2023

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

493

547

22 Apr 2023

Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism

Xiaotao Gu

103

22 Apr 2023

ChatABL: Abductive Learning via Natural Language Interaction with ChatGPTIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023

...

Tianming Liu

Tuo Zhang

LRM

192

21 Apr 2023

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

473

2,742

20 Apr 2023

Learning to Plan with Natural LanguageConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Dongyan Zhao

210

20 Apr 2023

Attention Scheme Inspired Softmax Regression

Yichuan Deng

Zhihang Li

Zhao Song

293

20 Apr 2023

Scaling Transformer to 1M tokens and beyond with RMT

339

111

19 Apr 2023

A Theory on Adam Instability in Large-Scale Machine Learning

Igor Molybog

...

200

19 Apr 2023

Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scalingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yuhang Li

279

18 Apr 2023

Visual Instruction TuningNeural Information Processing Systems (NeurIPS), 2023

1.2K

7,615

17 Apr 2023

LongForm: Effective Instruction Tuning with Reverse InstructionsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Abdullatif Köksal

279

17 Apr 2023

Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding

203

222

17 Apr 2023

An Evaluation on Large Language Model Outputs: Discourse and MemorizationNatural Language Processing Journal (JNLP), 2023

335

17 Apr 2023

Towards Better Instruction Following Language Models for Chinese: Investigating the Impact of Training Data and Evaluation

Xiangang Li

242

16 Apr 2023

On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence

...

320

154

13 Apr 2023

ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A ReviewCognitive Computation (Cogn. Comput.), 2023

402

134

13 Apr 2023

Solving Tensor Low Cycle Rank ApproximationBigData Congress [Services Society] (BSS), 2023

Yichuan Deng

Yeqi Gao

Zhao Song

195

13 Apr 2023

Are LLMs All You Need for Task-Oriented Dialogue?SIGDIAL Conferences (SIGDIAL), 2023

Vojtvech Hudevcek

Ondrej Dusek

227

13 Apr 2023

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

380

731

13 Apr 2023

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image GenerationNeural Information Processing Systems (NeurIPS), 2023

Xiao Liu

Yuxiao Dong

581

754

12 Apr 2023

HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented PromptingAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023

165

12 Apr 2023

ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Viet Dac Lai

Nghia Trung Ngo

Amir Pouran Ben Veyseh

263

362

12 Apr 2023

User Adaptive Language Learning Chatbots with a CurriculumInternational Conference on Artificial Intelligence in Education (AIED), 2023

Kun Qian

199

11 Apr 2023

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Qingxiu Dong

Lingpeng Kong

Lei Li

373

229

10 Apr 2023

Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension

Yichuan Deng

Sridhar Mahadevan

Zhao Song

199

10 Apr 2023

OpenAGI: When LLM Meets Domain ExpertsNeural Information Processing Systems (NeurIPS), 2023

Juntao Tan

330

310

10 Apr 2023

Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study

332

193

10 Apr 2023

A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding

238

09 Apr 2023

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

Zhiyuan Liu

175

08 Apr 2023

From Retrieval to Generation: Efficient and Effective Entity Set ExpansionInternational Conference on Information and Knowledge Management (CIKM), 2023

408

07 Apr 2023

Instruction Tuning with GPT-4

493

752

06 Apr 2023

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

348

124

06 Apr 2023

Zero-Shot Next-Item Recommendation using Large Pretrained Language Models

Lei Wang

Ee-Peng Lim

LRM

165

06 Apr 2023

Conceptual structure coheres in human cognition but not in large language modelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

288

05 Apr 2023

Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural NetworksACM Transactions on Software Engineering and Methodology (TOSEM), 2023

Michael Weiss

Paolo Tonella

AI4CE

191

05 Apr 2023

Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data

288

04 Apr 2023

Effective Theory of Transformers at Initialization

Emily Dinan

Sho Yaida

Susan Zhang

179

04 Apr 2023

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Lei Wang

318

388

04 Apr 2023

Resources and Few-shot Learners for In-context Learning in Slavic LanguagesWorkshop on Balto-Slavic Natural Language Processing (BSNLP), 2023

167

04 Apr 2023

Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural NetworksInternational Conference on Learning Representations (ICLR), 2023

Jun Zhao

505

04 Apr 2023

Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingInternational Conference on Machine Learning (ICML), 2023

...

397

1,641

03 Apr 2023

RPTQ: Reorder-based Post-training Quantization for Large Language Models

593

113

03 Apr 2023

Can the Inference Logic of Large Language Models be Disentangled into Symbolic Concepts?

196

03 Apr 2023