ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.00861
  4. Cited By
A General Language Assistant as a Laboratory for Alignment
v1v2v3 (latest)

A General Language Assistant as a Laboratory for Alignment

1 December 2021
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
T. Henighan
Andy Jones
Nicholas Joseph
Benjamin Mann
Nova Dassarma
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
John Kernion
Kamal Ndousse
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
    ALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "A General Language Assistant as a Laboratory for Alignment"

50 / 701 papers shown
WARM: On the Benefits of Weight Averaged Reward Models
WARM: On the Benefits of Weight Averaged Reward ModelsInternational Conference on Machine Learning (ICML), 2024
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
355
130
0
22 Jan 2024
InferAligner: Inference-Time Alignment for Harmlessness through
  Cross-Model Guidance
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model GuidanceConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Pengyu Wang
Dong Zhang
Linyang Li
Chenkun Tan
Xinghao Wang
Ke Ren
Botian Jiang
Xipeng Qiu
LLMSV
279
71
0
20 Jan 2024
Large-scale Reinforcement Learning for Diffusion Models
Large-scale Reinforcement Learning for Diffusion ModelsEuropean Conference on Computer Vision (ECCV), 2024
Yinan Zhang
Eric Tzeng
Yilun Du
Dmitry Kislyuk
VLM
264
68
0
20 Jan 2024
Reinforcement learning for question answering in programming domain
  using public community scoring as a human feedback
Reinforcement learning for question answering in programming domain using public community scoring as a human feedback
Alexey Gorbatovski
Sergey Kovalchuk
40
6
0
19 Jan 2024
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized
  Large Language Models
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhengxin Zhang
Dan Zhao
Xupeng Miao
Xupeng Miao
Qing Li
Yong Jiang
Zhihao Jia
MQ
198
11
0
13 Jan 2024
Towards Conversational Diagnostic AI
Towards Conversational Diagnostic AI
Tao Tu
Anil Palepu
M. Schaekermann
Khaled Saab
Jan Freyberg
...
Katherine Chou
Greg S. Corrado
Yossi Matias
Alan Karthikesalingam
Vivek Natarajan
AI4MHLM&MA
257
140
0
11 Jan 2024
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationInternational Conference on Machine Learning (ICML), 2024
Mahdi Nikdan
Soroush Tabesh
Elvir Crnčević
Dan Alistarh
502
45
0
09 Jan 2024
Agent Alignment in Evolving Social Norms
Agent Alignment in Evolving Social Norms
Shimin Li
Tianxiang Sun
Qinyuan Cheng
Xipeng Qiu
LLMAG
298
12
0
09 Jan 2024
MERA: A Comprehensive LLM Evaluation in Russian
MERA: A Comprehensive LLM Evaluation in RussianAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Alena Fenogenova
Artem Chervyakov
Nikita Martynov
Anastasia Kozlova
Maria Tikhonova
...
Nikita Savushkin
Polina Mikhailova
Denis Dimitrov
Sergey Petrakov
Sergey Markov
ELM
271
31
0
09 Jan 2024
A Philosophical Introduction to Language Models -- Part I: Continuity
  With Classic Debates
A Philosophical Introduction to Language Models -- Part I: Continuity With Classic Debates
Raphael Milliere
Cameron Buckner
LRMELM
191
38
0
08 Jan 2024
InFoBench: Evaluating Instruction Following Ability in Large Language
  Models
InFoBench: Evaluating Instruction Following Ability in Large Language Models
Yiwei Qin
Kaiqiang Song
Yebowen Hu
Wenlin Yao
Sangwoo Cho
Xiaoyang Wang
Xuansheng Wu
Fei Liu
Pengfei Liu
Dong Yu
ELM
232
88
0
07 Jan 2024
Blending Is All You Need: Cheaper, Better Alternative to
  Trillion-Parameters LLM
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Xiaoding Lu
Zongyi Liu
Adian Liusie
Vyas Raina
Vineet Mudupalli
Yuwen Zhang
W. Beauchamp
268
29
0
04 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xiaoyan Cai
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
464
123
0
04 Jan 2024
Align on the Fly: Adapting Chatbot Behavior to Established Norms
Align on the Fly: Adapting Chatbot Behavior to Established Norms
Chunpu Xu
Steffi Chern
Ethan Chern
Ge Zhang
Zekun Wang
Ruibo Liu
Jing Li
Jie Fu
Pengfei Liu
181
23
0
26 Dec 2023
Learning and Forgetting Unsafe Examples in Large Language Models
Learning and Forgetting Unsafe Examples in Large Language Models
Jiachen Zhao
Zhun Deng
David Madras
James Zou
Mengye Ren
MUKELMCLL
359
24
0
20 Dec 2023
InstructVideo: Instructing Video Diffusion Models with Human Feedback
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Hangjie Yuan
Shiwei Zhang
Xiang Wang
Yujie Wei
Tao Feng
Yining Pan
Yingya Zhang
Ziwei Liu
Samuel Albanie
Dong Ni
VGen
259
79
0
19 Dec 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
374
294
0
18 Dec 2023
Challenges with unsupervised LLM knowledge discovery
Challenges with unsupervised LLM knowledge discovery
Sebastian Farquhar
Vikrant Varma
Zachary Kenton
Johannes Gasteiger
Vladimir Mikulik
Rohin Shah
310
34
0
15 Dec 2023
Distributional Preference Learning: Understanding and Accounting for
  Hidden Context in RLHF
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHFInternational Conference on Learning Representations (ICLR), 2023
Anand Siththaranjan
Cassidy Laidlaw
Dylan Hadfield-Menell
460
92
0
13 Dec 2023
AI capabilities can be significantly improved without expensive
  retraining
AI capabilities can be significantly improved without expensive retraining
Tom Davidson
Jean-Stanislas Denain
Pablo Villalobos
Guillem Bas
OffRLVLM
236
31
0
12 Dec 2023
On Diversified Preferences of Large Language Model Alignment
On Diversified Preferences of Large Language Model AlignmentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Dun Zeng
Yong Dai
Pengyu Cheng
Longyue Wang
Tianhao Hu
Wanshun Chen
Nan Du
Zenglin Xu
ALM
388
21
0
12 Dec 2023
Alignment for Honesty
Alignment for HonestyNeural Information Processing Systems (NeurIPS), 2023
Yuqing Yang
Ethan Chern
Xipeng Qiu
Graham Neubig
Pengfei Liu
258
58
0
12 Dec 2023
Control Risk for Potential Misuse of Artificial Intelligence in Science
Control Risk for Potential Misuse of Artificial Intelligence in Science
Jiyan He
Weitao Feng
Yaosen Min
Jingwei Yi
Kunsheng Tang
...
Wenbo Zhou
Xing Xie
Weiming Zhang
Neng H. Yu
Shuxin Zheng
215
15
0
11 Dec 2023
Steering Llama 2 via Contrastive Activation Addition
Steering Llama 2 via Contrastive Activation AdditionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Nina Rimsky
Nick Gabrieli
Julian Schulz
Meg Tong
Evan Hubinger
Alexander Matt Turner
LLMSV
445
446
0
09 Dec 2023
Language Model Alignment with Elastic Reset
Language Model Alignment with Elastic Reset
Michael Noukhovitch
Samuel Lavoie
Florian Strub
Aaron Courville
KELM
324
35
0
06 Dec 2023
ULMA: Unified Language Model Alignment with Human Demonstration and
  Point-wise Preference
ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference
Tianchi Cai
Xierui Song
Jiyan Jiang
Fei Teng
Jinjie Gu
Guannan Zhang
ALM
197
8
0
05 Dec 2023
MUFFIN: Curating Multi-Faceted Instructions for Improving
  Instruction-Following
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-FollowingInternational Conference on Learning Representations (ICLR), 2023
Renze Lou
Kai Zhang
Jian Xie
Yuxuan Sun
Janice Ahn
Hanzi Xu
Yu Su
Wenpeng Yin
265
36
0
05 Dec 2023
Personality of AI
Personality of AIInternational Conference on Artificial Intelligence and Soft Computing (ICAISC), 2023
Byunggu Yu
Junwhan Kim
169
2
0
03 Dec 2023
Axiomatic Preference Modeling for Longform Question Answering
Axiomatic Preference Modeling for Longform Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Corby Rosset
Guoqing Zheng
Victor C. Dibia
Ahmed Hassan Awadallah
Paul Bennett
SyDa
150
7
0
02 Dec 2023
TaskWeaver: A Code-First Agent Framework
TaskWeaver: A Code-First Agent Framework
Bo Qiao
Liqun Li
Xu Zhang
Shilin He
Yu Kang
...
Chao Du
Yong Xu
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LLMAG
264
56
0
29 Nov 2023
Elo Uncovered: Robustness and Best Practices in Language Model
  Evaluation
Elo Uncovered: Robustness and Best Practices in Language Model EvaluationIEEE Games Entertainment Media Conference (IEEE GEM), 2023
M. Boubdir
Edward Kim
Beyza Ermis
Sara Hooker
Marzieh Fadaee
ELM
224
66
0
29 Nov 2023
Adversarial Diffusion Distillation
Adversarial Diffusion DistillationEuropean Conference on Computer Vision (ECCV), 2023
Axel Sauer
Dominik Lorenz
A. Blattmann
Robin Rombach
884
603
0
28 Nov 2023
Foundational Moral Values for AI Alignment
Foundational Moral Values for AI Alignment
Betty Hou
Brian Patrick Green
177
1
0
28 Nov 2023
CDEval: A Benchmark for Measuring the Cultural Dimensions of Large
  Language Models
CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models
Yuhang Wang
Yanxu Zhu
Chao Kong
Shuyu Wei
Xiaoyuan Yi
Xing Xie
Jitao Sang
ALMVLMELM
169
16
0
28 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems: Data, Task and Models
A Survey of the Evolution of Language Model-Based Dialogue Systems: Data, Task and Models
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
452
23
0
28 Nov 2023
Cognitive Dissonance: Why Do Language Model Outputs Disagree with
  Internal Representations of Truthfulness?
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Kevin Liu
Stephen Casper
Dylan Hadfield-Menell
Jacob Andreas
HILM
265
51
0
27 Nov 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
979
1,953
0
25 Nov 2023
Boosting the Power of Small Multimodal Reasoning Models to Match Larger
  Models with Self-Consistency Training
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency TrainingEuropean Conference on Computer Vision (ECCV), 2023
Cheng Tan
Jingxuan Wei
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Ruifeng Guo
Xihong Yang
Stan Z. Li
LRM
291
29
0
23 Nov 2023
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model Alignment Using Direct Preference OptimizationComputer Vision and Pattern Recognition (CVPR), 2023
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Shafiq Joty
Nikhil Naik
EGVM
449
516
0
21 Nov 2023
Case Repositories: Towards Case-Based Reasoning for AI Alignment
Case Repositories: Towards Case-Based Reasoning for AI Alignment
K. J. Kevin Feng
Quan Ze Chen
Inyoung Cheong
King Xia
Amy X. Zhang
167
13
0
18 Nov 2023
DRESS: Instructing Large Vision-Language Models to Align and Interact
  with Humans via Natural Language Feedback
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
439
99
0
16 Nov 2023
Cognitive Overload: Jailbreaking Large Language Models with Overloaded
  Logical Thinking
Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking
Nan Xu
Fei Wang
Ben Zhou
Bangzheng Li
Chaowei Xiao
Muhao Chen
360
85
0
16 Nov 2023
LifeTox: Unveiling Implicit Toxicity in Life Advice
LifeTox: Unveiling Implicit Toxicity in Life Advice
Minbeom Kim
Jahyun Koo
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
307
11
0
16 Nov 2023
An Empathetic User-Centric Chatbot for Emotional Support
An Empathetic User-Centric Chatbot for Emotional Support
Yanting Pan
Yixuan Tang
Yuchen Niu
91
6
0
15 Nov 2023
Value FULCRA: Mapping Large Language Models to the Multidimensional
  Spectrum of Basic Human Values
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human ValuesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yifan Gong
Xing Xie
299
40
0
15 Nov 2023
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM
  Game
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM GameAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Pengyu Cheng
Yifan Yang
Jian Li
Yong Dai
Tianhao Hu
Peixin Cao
Nan Du
Xiaolong Li
706
34
0
14 Nov 2023
Generalization Analogies: A Testbed for Generalizing AI Oversight to
  Hard-To-Measure Domains
Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Joshua Clymer
Garrett Baker
Rohan Subramani
Sam Wang
365
7
0
13 Nov 2023
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
MART: Improving LLM Safety with Multi-round Automatic Red-TeamingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Suyu Ge
Chunting Zhou
Rui Hou
Madian Khabsa
Yi-Chia Wang
Qifan Wang
Jiawei Han
Yuning Mao
AAMLLRM
212
150
0
13 Nov 2023
Psychometric Predictive Power of Large Language Models
Psychometric Predictive Power of Large Language Models
Tatsuki Kuribayashi
Yohei Oseki
Timothy Baldwin
LM&MA
277
7
0
13 Nov 2023
Flames: Benchmarking Value Alignment of LLMs in Chinese
Flames: Benchmarking Value Alignment of LLMs in ChineseNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Kexin Huang
Xiangyang Liu
Qianyu Guo
Tianxiang Sun
Jiawei Sun
...
Yixu Wang
Yan Teng
Xipeng Qiu
Yingchun Wang
Dahua Lin
ALM
412
30
0
12 Nov 2023
Previous
123...91011...131415
Next