ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.00861
  4. Cited By
A General Language Assistant as a Laboratory for Alignment
v1v2v3 (latest)

A General Language Assistant as a Laboratory for Alignment

1 December 2021
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
T. Henighan
Andy Jones
Nicholas Joseph
Benjamin Mann
Nova Dassarma
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
John Kernion
Kamal Ndousse
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
    ALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "A General Language Assistant as a Laboratory for Alignment"

50 / 698 papers shown
Title
BioCoder: A Benchmark for Bioinformatics Code Generation with Large
  Language Models
BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models
Xiangru Tang
Bill Qian
Rick Gao
Jiakang Chen
Xinyun Chen
Mark B. Gerstein
410
27
0
31 Aug 2023
Peering Through Preferences: Unraveling Feedback Acquisition for
  Aligning Large Language Models
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Hritik Bansal
John Dang
Aditya Grover
ALM
206
25
0
30 Aug 2023
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through
  the Lens of Moral Theories?
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
Jingyan Zhou
Minda Hu
Junan Li
Xiaoying Zhang
Xixin Wu
Irwin King
Helen M. Meng
LRM
210
38
0
29 Aug 2023
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
Text-to-SQL Empowered by Large Language Models: A Benchmark EvaluationProceedings of the VLDB Endowment (PVLDB), 2023
Dawei Gao
Haibin Wang
Yaliang Li
Xiuyu Sun
Yichen Qian
Bolin Ding
Jingren Zhou
AI4TS
462
447
0
29 Aug 2023
AI Deception: A Survey of Examples, Risks, and Potential Solutions
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Peter S. Park
Simon Goldstein
Aidan O'Gara
Michael Chen
Dan Hendrycks
239
229
0
28 Aug 2023
From Instructions to Intrinsic Human Values -- A Survey of Alignment
  Goals for Big Models
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yongfeng Zhang
Xing Xie
ALM
329
55
0
23 Aug 2023
Red-Teaming Large Language Models using Chain of Utterances for
  Safety-Alignment
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Rishabh Bhardwaj
Soujanya Poria
ELM
293
205
0
18 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following
  Inspired by Real-World Use
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
367
97
0
12 Aug 2023
CLEVA: Chinese Language Models EVAluation Platform
CLEVA: Chinese Language Models EVAluation PlatformConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yanyang Li
Jianqiao Zhao
Duo Zheng
Zi-Yuan Hu
Zhi Chen
...
Yongfeng Huang
Shijia Huang
Dahua Lin
Michael R. Lyu
Liwei Wang
ALMELM
251
15
0
09 Aug 2023
Simple synthetic data reduces sycophancy in large language models
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
394
93
0
07 Aug 2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in
  Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALMELMAILaw
250
246
0
02 Aug 2023
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt
  Injection
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt InjectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
SILM
223
146
0
31 Jul 2023
Evaluating the Moral Beliefs Encoded in LLMs
Evaluating the Moral Beliefs Encoded in LLMsNeural Information Processing Systems (NeurIPS), 2023
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
207
191
0
26 Jul 2023
RLCD: Reinforcement Learning from Contrastive Distillation for Language
  Model Alignment
RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment
Kevin Kaichuang Yang
Dan Klein
Asli Celikyilmaz
Nanyun Peng
Yuandong Tian
ALM
340
37
0
24 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill
  Sets
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill SetsInternational Conference on Learning Representations (ICLR), 2023
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
495
143
0
20 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
5.3K
14,891
0
18 Jul 2023
AlpaGasus: Training A Better Alpaca with Fewer Data
AlpaGasus: Training A Better Alpaca with Fewer Data
Lichang Chen
Shiyang Li
Jun Yan
Hai Wang
Kalpa Gunaratna
...
Zheng Tang
Vijay Srinivasan
Wanrong Zhu
Heng-Chiao Huang
Hongxia Jin
ALM
287
238
0
17 Jul 2023
In-context Autoencoder for Context Compression in a Large Language Model
In-context Autoencoder for Context Compression in a Large Language ModelInternational Conference on Learning Representations (ICLR), 2023
Tao Ge
Jing Hu
Lei Wang
Xun Wang
Si-Qing Chen
Furu Wei
RALM
354
116
0
13 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Lin Wang
OffRL
772
1,090
0
12 Jul 2023
Secrets of RLHF in Large Language Models Part I: PPO
Secrets of RLHF in Large Language Models Part I: PPO
Rui Zheng
Jiajun Sun
Songyang Gao
Yuan Hua
Wei Shen
...
Hang Yan
Tao Gui
Tao Gui
Xipeng Qiu
Xuanjing Huang
ALMOffRL
247
226
0
11 Jul 2023
BeaverTails: Towards Improved Safety Alignment of LLM via a
  Human-Preference Dataset
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference DatasetNeural Information Processing Systems (NeurIPS), 2023
Jiaming Ji
Mickel Liu
Juntao Dai
Xuehai Pan
Chi Zhang
Ce Bian
Chi Zhang
Ruiyang Sun
Yizhou Wang
Yaodong Yang
ALM
259
689
0
10 Jul 2023
Improving Prototypical Visual Explanations with Reward Reweighing,
  Reselection, and Retraining
Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and RetrainingInternational Conference on Machine Learning (ICML), 2023
Aaron J. Li
Robin Netzorg
Zhihan Cheng
Zhuoqin Zhang
Bin Yu
227
4
0
08 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
319
150
0
06 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Yu-Chu Chang
Xu Wang
Yongfeng Zhang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
584
2,627
0
06 Jul 2023
Style Over Substance: Evaluation Biases for Large Language Models
Style Over Substance: Evaluation Biases for Large Language ModelsInternational Conference on Computational Linguistics (COLING), 2023
Minghao Wu
Alham Fikri Aji
ALMELM
554
61
0
06 Jul 2023
Scaling Laws Do Not Scale
Scaling Laws Do Not ScaleAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023
Fernando Diaz
Michael A. Madaio
297
14
0
05 Jul 2023
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN
  Fine-Tuning
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Deepanway Ghosal
Yew Ken Chia
Navonil Majumder
Soujanya Poria
ALMLRM
135
20
0
05 Jul 2023
Trainable Transformer in Transformer
Trainable Transformer in TransformerInternational Conference on Machine Learning (ICML), 2023
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
292
13
0
03 Jul 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language
  Models
Let Me Teach You: Pedagogical Foundations of Feedback for Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Beatriz Borges
Niket Tandon
Tanja Käser
Antoine Bosselut
400
8
0
01 Jul 2023
Stay on topic with Classifier-Free Guidance
Stay on topic with Classifier-Free Guidance
Guillaume Sanchez
Honglu Fan
Alexander Spangher
Elad Levi
Pawan Sasanka Ammanamanchi
Stella Biderman
3DV
187
67
0
30 Jun 2023
Towards Measuring the Representation of Subjective Global Opinions in
  Language Models
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
243
326
0
28 Jun 2023
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
  Collaboration for Large Language Models
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language ModelsInternational Conference on Language Resources and Evaluation (LREC), 2023
Yufei Huang
Deyi Xiong
ALM
211
23
0
28 Jun 2023
System-Level Natural Language Feedback
System-Level Natural Language FeedbackConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Weizhe Yuan
Dong Wang
Jason Weston
281
5
0
23 Jun 2023
Apolitical Intelligence? Auditing Delphi's responses on controversial
  political issues in the US
Apolitical Intelligence? Auditing Delphi's responses on controversial political issues in the US
J. H. Rystrøm
103
0
0
22 Jun 2023
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large
  Foundation Models
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Shizhe Diao
Boyao Wang
Hanze Dong
Kashun Shum
Jipeng Zhang
Wei Xiong
Tong Zhang
ALM
254
73
0
21 Jun 2023
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Christopher T. Small
Ivan Vendrov
Esin Durmus
Hadjar Homaei
Elizabeth Barry
Julien Cornebise
Ted Suzman
Deep Ganguli
Colin Megill
164
48
0
20 Jun 2023
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language
  Models
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models
Yue Huang
Qihui Zhang
Philip S. Y
Lichao Sun
177
62
0
20 Jun 2023
Inverse Scaling: When Bigger Isn't Better
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
228
173
0
15 Jun 2023
Propagating Knowledge Updates to LMs Through Distillation
Propagating Knowledge Updates to LMs Through DistillationNeural Information Processing Systems (NeurIPS), 2023
Shankar Padmanabhan
Yasumasa Onoe
Michael J.Q. Zhang
Greg Durrett
Eunsol Choi
KELM
220
21
0
15 Jun 2023
FLamE: Few-shot Learning from Natural Language Explanations
FLamE: Few-shot Learning from Natural Language ExplanationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yangqiaoyu Zhou
Yiming Zhang
Chenhao Tan
LRMFAtt
233
13
0
13 Jun 2023
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large
  Language Models
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
Yew Ken Chia
Pengfei Hong
Lidong Bing
Soujanya Poria
ELM
156
73
0
07 Jun 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating
  weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewardsNeural Information Processing Systems (NeurIPS), 2023
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
295
197
0
07 Jun 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language
  Model
Inference-Time Intervention: Eliciting Truthful Answers from a Language ModelNeural Information Processing Systems (NeurIPS), 2023
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELMHILM
619
795
0
06 Jun 2023
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Subhabrata Mukherjee
Arindam Mitra
Ganesh Jawahar
Sahaj Agarwal
Hamid Palangi
Ahmed Hassan Awadallah
ELMALMLRM
378
336
0
05 Jun 2023
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Banghua Zhu
Hiteshi Sharma
Felipe Vieira Frujeri
Shi Dong
Chenguang Zhu
Michael I. Jordan
Jiantao Jiao
OSLM
217
47
0
04 Jun 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for
  Biomedicine in One Day
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One DayNeural Information Processing Systems (NeurIPS), 2023
Chunyuan Li
Cliff Wong
Sheng Zhang
Naoto Usuyama
Haotian Liu
Jianwei Yang
Tristan Naumann
Hoifung Poon
Jianfeng Gao
LM&MAMedIm
226
1,232
0
01 Jun 2023
Let's Verify Step by Step
Let's Verify Step by StepInternational Conference on Learning Representations (ICLR), 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALMOffRLLRM
770
2,080
0
31 May 2023
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language
  Models' Reasoning Performance
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
Yao Fu
Litu Ou
Mingyu Chen
Yuhao Wan
Hao-Chun Peng
Tushar Khot
LLMAGELMLRMReLM
144
124
0
26 May 2023
Training Socially Aligned Language Models on Simulated Social
  Interactions
Training Socially Aligned Language Models on Simulated Social InteractionsInternational Conference on Learning Representations (ICLR), 2023
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
204
88
0
26 May 2023
Heterogeneous Value Alignment Evaluation for Large Language Models
Heterogeneous Value Alignment Evaluation for Large Language ModelsArtificial General Intelligence (AGI), 2023
Zhaowei Zhang
Ceyao Zhang
N. Liu
Siyuan Qi
Ziqi Rong
Song-Chun Zhu
Shuguang Cui
Yaodong Yang
302
6
0
26 May 2023
Previous
123...11121314
Next