v1v2v3 (latest)

A General Language Assistant as a Laboratory for Alignment

1 December 2021

Deep Ganguli

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "A General Language Assistant as a Laboratory for Alignment"

50 / 701 papers shown

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

...

Hang Su

Jun Zhu

Lei Zhang

Jianfeng Gao

Chun-yue Li

MLLM VLM

272

190

09 Nov 2023

Unveiling Safety Vulnerabilities of Large Language Models

254

07 Nov 2023

FinGPT: Large Generative Models for a Small LanguageConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

...

185

03 Nov 2023

The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

Nathan Lambert

Roberto Calandra

ALM

465

31 Oct 2023

Automatic Evaluation of Generative Models with Instruction TuningIEEE Games Entertainment Media Conference (IEEE GEM), 2023

Shuhaib Mehri

Vered Shwartz

ELM ALM

136

30 Oct 2023

Personas as a Way to Model Truthfulness in Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

395

27 Oct 2023

Unpacking the Ethical Value Alignment in Big Models

Xiaoyuan Yi

Jing Yao

Xiting Wang

Xing Xie

185

26 Oct 2023

SuperHF: Supervised Iterative Learning from Human Feedback

262

25 Oct 2023

OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models

Dayiheng Liu

Chang Zhou

Jingren Zhou

LLMAG

176

25 Oct 2023

AI Alignment and Social Choice: Fundamental Limitations and Policy ImplicationsSocial Science Research Network (SSRN), 2023

Abhilash Mishra

24 Oct 2023

Self-Guard: Empower the LLM to Safeguard ItselfNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Zezhong Wang

270

24 Oct 2023

Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular ClassificationsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Yanchen Liu

Srishti Gautam

Jiaqi Ma

Himabindu Lakkaraju

LMTD

218

23 Oct 2023

AlpaCare:Instruction-tuned Large Language Models for Medical Application

460

23 Oct 2023

From the Pursuit of Universal AGI Architecture to Systematic Approach to Heterogenous AGI: Addressing Alignment, Energy, & AGI Grand ChallengesInternational Journal of Semantic Computing (IJSC), 2023

Eren Kurshan

413

23 Oct 2023

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation

283

22 Oct 2023

Values, Ethics, Morals? On the Use of Moral Concepts in NLP ResearchConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Karina Vida

Judith Simon

Anne Lauscher

243

21 Oct 2023

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Jiaming Ji

399

537

19 Oct 2023

Improving Generalization of Alignment with Human Preferences through Group Invariant LearningInternational Conference on Learning Representations (ICLR), 2023

Wei Shen

...

Xuanjing Huang

285

18 Oct 2023

Group Preference Optimization: Few-Shot Alignment of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Siyan Zhao

John Dang

Aditya Grover

345

17 Oct 2023

RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior MechanismsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Xuanjing Huang

159

17 Oct 2023

Privacy in Large Language Models: Attacks, Defenses and Future Directions

441

16 Oct 2023

The Consensus Game: Language Model Generation via Equilibrium Search

246

13 Oct 2023

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic TaskNeural Information Processing Systems (NeurIPS), 2023

498

13 Oct 2023

Prometheus: Inducing Fine-grained Evaluation Capability in Language ModelsInternational Conference on Learning Representations (ICLR), 2023

...

522

372

12 Oct 2023

Evaluating Large Language Models at Evaluating Instruction FollowingInternational Conference on Learning Representations (ICLR), 2023

412

264

11 Oct 2023

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and ValuesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Paul Röttger

358

11 Oct 2023

Catastrophic Jailbreak of Open-source LLMs via Exploiting GenerationInternational Conference on Learning Representations (ICLR), 2023

255

405

10 Oct 2023

MetaAgents: Large Language Model Based Agents for Decision-Making on Teaming

383

119

10 Oct 2023

SALMON: Self-Alignment with Instructable Reward ModelsInternational Conference on Learning Representations (ICLR), 2023

Chuang Gan

353

09 Oct 2023

A Closer Look into Automatic Evaluation Using Large Language Models

Cheng-Han Chiang

Hunghuei Lee

ELM ALM LM&MA

142

09 Oct 2023

Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for Autonomous LLM-powered Multi-Agent Architectures

Thorsten Händler

LLMAG

206

05 Oct 2023

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

Hannah Rose Kirk

Bertie Vidgen

Paul Röttger

Scott A. Hale

385

03 Oct 2023

Ask Again, Then Fail: Large Language Models' Vacillations in JudgmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

653

03 Oct 2023

Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot PerformanceAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

280

03 Oct 2023

Tool-Augmented Reward ModelingInternational Conference on Learning Representations (ICLR), 2023

Lei Li

259

02 Oct 2023

Enabling Language Models to Implicitly Learn Self-Improvement

Heng Ji

279

02 Oct 2023

Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context LearningInternational Conference on Learning Representations (ICLR), 2023

428

01 Oct 2023

Directly Fine-Tuning Diffusion Models on Differentiable RewardsInternational Conference on Learning Representations (ICLR), 2023

Amita Gajewar

Paul Vicol

G. Bansal

David J Fleet

267

300

29 Sep 2023

Qwen Technical Report

Jinze Bai

Shuai Bai

Yunfei Chu

Zeyu Cui

Kai Dang

...

Zhenru Zhang

Chang Zhou

Jingren Zhou

Xiaohuan Zhou

Tianhang Zhu

OSLM

797

3,067

28 Sep 2023

GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond

Yuyu Zhang

Pengyang Gao

Kevin Chen-Chuan Chang

ELM

355

28 Sep 2023

Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence ConstraintsInternational Conference on Learning Representations (ICLR), 2023

255

145

28 Sep 2023

The Trickle-down Impact of Reward (In-)consistency on RLHF

Lingfeng Shen

Linfeng Song

Daniel Khashabi

Dong Yu

251

28 Sep 2023

Large Language Model Alignment: A Survey

359

282

26 Sep 2023

Aligning Large Multimodal Models with Factually Augmented RLHFAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

...

285

592

25 Sep 2023

Can LLM-Generated Misinformation Be Detected?International Conference on Learning Representations (ICLR), 2023

Canyu Chen

Kai Shu

DeLMO

782

241

25 Sep 2023

Stabilizing RLHF through Advantage Model and Selective Rehearsal

Linfeng Song

Dong Yu

190

18 Sep 2023

RAIN: Your Language Models Can Align Themselves without FinetuningInternational Conference on Learning Representations (ICLR), 2023

295

157

13 Sep 2023

Mitigating the Alignment Tax of RLHFConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yong Lin

Wei Xiong

...

Tong Zhang

666

129

12 Sep 2023

Everyone Deserves A Reward: Learning Customized Human Preferences

213

06 Sep 2023

Data-Juicer: A One-Stop Data Processing System for Large Language Models

...

Jingren Zhou

297

05 Sep 2023