Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2008.02275
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Aligning AI With Shared Human Values
5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Aligning AI With Shared Human Values"
50 / 463 papers shown
Inducing Human-like Biases in Moral Reasoning Language Models
Artem Karpov
Seong Hah Cho
Austin Meek
Raymond Koopmanschap
Lucy Farnik
Bogdan-Ionut Cirstea
LRM
235
1
0
23 Nov 2024
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Haonan Wang
Qian Liu
Chao Du
Tongyao Zhu
Cunxiao Du
Kenji Kawaguchi
Tianyu Pang
469
10
0
20 Nov 2024
Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
Allison Huang
Yulu Niki Pi
Carlos Mougan
234
3
0
18 Nov 2024
Value Imprint: A Technique for Auditing the Human Values Embedded in RLHF Datasets
Neural Information Processing Systems (NeurIPS), 2024
Ike Obi
Rohan Pant
Srishti Shekhar Agrawal
Maham Ghazanfar
Aaron Basiletti
230
8
0
18 Nov 2024
Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics Statements
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Antonia Karamolegkou
Sandrine Schiller Hansen
Ariadni Christopoulou
Filippos Stamatiou
Anne Lauscher
Anders Søgaard
170
0
0
12 Nov 2024
Minion: A Technology Probe for Resolving Value Conflicts through Expert-Driven and User-Driven Strategies in AI Companion Applications
Xianzhe Fan
Qing Xiao
Xuhui Zhou
Yuran Su
Zhicong Lu
Maarten Sap
Hong Shen
188
1
0
11 Nov 2024
Evaluating Moral Beliefs across LLMs through a Pluralistic Framework
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xuelin Liu
Yanfei Zhu
Shucheng Zhu
Pengyuan Liu
Ying Liu
Dong Yu
265
7
0
06 Nov 2024
MoD: A Distribution-Based Approach for Merging Large Language Models
Quy-Anh Dang
Chris Ngo
MoMe
VLM
250
0
0
01 Nov 2024
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Gabrielle Kaili-May Liu
Bowen Shi
Avi Caciularu
Idan Szpektor
Arman Cohan
618
10
0
30 Oct 2024
Rethinking Data Synthesis: A Teacher Model Training Recipe with Interpretation
Yifang Chen
David Zhu
SyDa
135
0
0
27 Oct 2024
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
704
12
0
26 Oct 2024
From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages
Artur Kiulian
Anton Polishko
M. Khandoga
Yevhen Kostiuk
Guillermo Gabrielli
...
Hrishikesh Garud
Wendy Wing Yee Mak
Dmytro Chaplynskyi
Selma Belhadj Amor
Grigol Peradze
212
4
0
24 Oct 2024
PLDR-LLM: Large Language Model from Power Law Decoder Representations
Burc Gokden
143
2
0
22 Oct 2024
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence
BigData Congress [Services Society] (BSS), 2024
Norbert Tihanyi
Tamás Bisztray
Richard A. Dubniczky
Rebeka Tóth
B. Borsos
...
Ryan Marinelli
Lucas C. Cordeiro
Merouane Debbah
Vasileios Mavroeidis
Audun Josang
262
9
0
20 Oct 2024
Speciesism in Natural Language Processing Research
AI and Ethics (AI & Ethics), 2024
Masashi Takeshita
Rafal Rzepka
216
7
0
18 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
262
3
0
17 Oct 2024
BenTo: Benchmark Task Reduction with In-Context Transferability
Hongyu Zhao
Ming Li
Lichao Sun
Tianyi Zhou
289
2
0
17 Oct 2024
Learning to Route LLMs with Confidence Tokens
Yu-Neng Chuang
Helen Zhou
Prathusha Kameswara Sarma
Parikshit Gopalan
John Boccio
Sara Bolouki
Helen Zhou
285
0
0
17 Oct 2024
LLM-Human Pipeline for Cultural Context Grounding of Conversations
Rajkumar Pujari
Dan Goldwasser
261
2
0
17 Oct 2024
Adapt-
∞
\infty
∞
: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
International Conference on Learning Representations (ICLR), 2024
A. Maharana
Jaehong Yoon
Tianlong Chen
Joey Tianyi Zhou
312
0
0
14 Oct 2024
Evaluating Gender Bias of LLMs in Making Morality Judgements
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Divij Bajaj
Yuanyuan Lei
Jonathan Tong
Ruihong Huang
160
11
0
13 Oct 2024
SocialGaze: Improving the Integration of Human Social Norms in Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Anvesh Rao Vijjini
Rakesh R Menon
Jiayi Fu
Shashank Srivastava
Snigdha Chaturvedi
ALM
219
4
0
11 Oct 2024
Do Unlearning Methods Remove Information from Language Model Weights?
Aghyad Deeb
Fabien Roger
AAML
MU
424
42
0
11 Oct 2024
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty Simulations
Nathalie Maria Kirch
Konstantin Hebenstreit
Matthias Samwald
194
4
0
10 Oct 2024
Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses
Pranav Senthilkumar
Visshwa Balasubramanian
Prisha Jain
Aneesa Maity
Jonathan Lu
Kevin Zhu
145
3
0
10 Oct 2024
The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making
Basile Garcia
Crystal Qian
Stefano Palminteri
ELM
218
13
0
09 Oct 2024
Scaling Laws For Mixed Quantization
Zeyu Cao
Boyang Gu
Cheng Zhang
Pedro Gimenes
Jianqiao Lu
Jianyi Cheng
Xitong Gao
Yiren Zhao
MQ
347
1
0
09 Oct 2024
Intuitions of Compromise: Utilitarianism vs. Contractualism
Jared Moore
Yejin Choi
Sydney Levine
230
1
0
07 Oct 2024
Unlocking Structured Thinking in Language Models with Cognitive Prompting
Oliver Kramer
Jill Baumann
ReLM
LRM
272
9
0
03 Oct 2024
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
International Conference on Learning Representations (ICLR), 2024
Yu Ying Chiu
Liwei Jiang
Yejin Choi
325
25
0
03 Oct 2024
Examining the Role of Relationship Alignment in Large Language Models
Kristen M. Altenburger
Hongda Jiang
Robert E. Kraut
Yi-Chia Wang
Jane Dwivedi-Yu
179
1
0
02 Oct 2024
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
439
5
0
02 Oct 2024
Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliability
Weitong Zhang
Chengqi Zang
Bernhard Kainz
218
1
0
01 Oct 2024
Predicting memorization within Large Language Models fine-tuned for classification
Jérémie Dentan
Davide Buscaldi
A. Shabou
Sonia Vanier
336
1
0
27 Sep 2024
Post-hoc Reward Calibration: A Case Study on Length Bias
International Conference on Learning Representations (ICLR), 2024
Zeyu Huang
Zihan Qiu
Zili Wang
Edoardo M. Ponti
Ivan Titov
299
12
0
25 Sep 2024
JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language Models
International Conference on Computational Linguistics (COLING), 2024
Junfeng Jiang
Jiahao Huang
Akiko Aizawa
LM&MA
192
7
0
20 Sep 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
The Web Conference (WWW), 2024
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4Ed
ELM
371
6
0
19 Sep 2024
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
Hua Shen
Tiffany Knearem
Reshmi Ghosh
Yu-Ju Yang
Nicholas Clark
Tanushree Mitra
Yun Huang
326
0
0
15 Sep 2024
DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning
Keer Lu
Xiaonan Nie
Zhuoran Zhang
Zheng Liang
Da Pan
...
Weipeng Chen
Guosheng Dong
Bin Cui
Bin Cui
Wentao Zhang
217
2
0
02 Sep 2024
ToolACE: Winning the Points of LLM Function Calling
International Conference on Learning Representations (ICLR), 2024
Weiwen Liu
Xiaolin Huang
Xingshan Zeng
Xinlong Hao
Shuai Yu
...
Xin Jiang
Ruiming Tang
Defu Lian
Qun Liu
Tong Xu
LLMAG
303
112
0
02 Sep 2024
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning
Maxime Méloux
Christophe Cerisara
KELM
CLL
261
1
0
30 Aug 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
International Conference on Learning Representations (ICLR), 2024
Wenxuan Zhang
Juil Sock
Mohamed Elhoseiny
Adel Bibi
494
23
0
27 Aug 2024
Investigating LLM Applications in E-Commerce
Chester Palen-Michel
Ruixiang Wang
Yipeng Zhang
David Yu
Canran Xu
Zhe Wu
162
11
0
23 Aug 2024
Beyond Labels: Aligning Large Language Models with Human-like Reasoning
International Conference on Pattern Recognition (ICPR), 2024
Muhammad Rafsan Kabir
Rafeed Mohammad Sultan
Ihsanul Haque Asif
Jawad Ibn Ahad
Fuad Rahman
Mohammad Ruhul Amin
Nabeel Mohammed
Shafin Rahman
LRM
188
7
0
20 Aug 2024
Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory
Yongxin Deng
Xihe Qiu
Jue Chen
Jing Pan
Chen Jue
Zhijun Fang
Yinghui Xu
Wei Chu
Yuan Qi
213
3
0
20 Aug 2024
Value Alignment from Unstructured Text
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Inkit Padhi
Karthikeyan N. Ramamurthy
P. Sattigeri
Manish Nagireddy
Pierre Dognin
Kush R. Varshney
227
0
0
19 Aug 2024
CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Linhao Yu
Yongqi Leng
Yufei Huang
Shang Wu
Haixin Liu
...
Jinwang Song
Tingting Cui
Xiaoqing Cheng
Tao Liu
Deyi Xiong
ELM
130
9
0
19 Aug 2024
How Well Do LLMs Identify Cultural Unity in Diversity?
Jialin Li
Junli Wang
Junjie Hu
Ming Jiang
228
8
0
09 Aug 2024
Prompt and Prejudice
Lorenzo Berlincioni
Luca Cultrera
Federico Becattini
Marco Bertini
Marco Bertini
225
1
0
07 Aug 2024
Pula: Training Large Language Models for Setswana
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Nathan Brown
Vukosi Marivate
OSLM
317
0
0
05 Aug 2024
Previous
1
2
3
4
5
...
8
9
10
Next