ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.04359
  4. Cited By
Ethical and social risks of harm from Language Models

Ethical and social risks of harm from Language Models

8 December 2021
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
Po-Sen Huang
Myra Cheng
Mia Glaese
Borja Balle
Atoosa Kasirzadeh
Zachary Kenton
S. Brown
Will Hawkins
T. Stepleton
Courtney Biles
Abeba Birhane
Julia Haas
Laura Rimell
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
    PILM
ArXivPDFHTML

Papers citing "Ethical and social risks of harm from Language Models"

50 / 172 papers shown
Title
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
Pedro M. P. Curvo
Mara Dragomir
Salvador Torpes
Mohammadmahdi Rahimi
LLMAG
11
0
0
14 May 2025
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAML
ELM
28
0
0
12 May 2025
Real-World Gaps in AI Governance Research
Real-World Gaps in AI Governance Research
Ilan Strauss
Isobel Moure
Tim O'Reilly
Sruly Rosenblat
61
0
0
30 Apr 2025
$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation
SAGE\texttt{SAGE}SAGE: A Generic Framework for LLM Safety Evaluation
Madhur Jindal
Hari Shrawgi
Parag Agrawal
Sandipan Dandapat
ELM
47
0
0
28 Apr 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Ren-Wei Liang
Chin-Ting Hsu
Chan-Hung Yu
Saransh Agrawal
Shih-Cheng Huang
Shang-Tse Chen
Kuan-Hao Huang
Shao-Hua Sun
76
0
0
27 Apr 2025
AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How
AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How
Omid Veisi
Sasan Bahrami
Roman Englert
Claudia Müller
112
0
0
25 Apr 2025
Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Xiuying Chen
Tairan Wang
Juexiao Zhou
Zirui Song
Xin Gao
X. Zhang
MedIm
39
1
0
24 Apr 2025
aiXamine: Simplified LLM Safety and Security
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
77
0
0
21 Apr 2025
Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models
Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models
Tri Nguyen
Lohith Srikanth Pentapalli
Magnus Sieverding
Laurah Turner
Seth Overla
...
Michael Gharib
Matt Kelleher
Michael Shukis
Cameron Pawlik
Kelly Cohen
51
0
0
21 Apr 2025
Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High- and Low-Resource Languages
Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High- and Low-Resource Languages
Alessio Buscemi
Cedric Lothritz
Sergio Morales
Marcos Gomez-Vazquez
Robert Clarisó
Jordi Cabot
German Castignani
26
0
0
19 Apr 2025
Demo: ViolentUTF as An Accessible Platform for Generative AI Red Teaming
Demo: ViolentUTF as An Accessible Platform for Generative AI Red Teaming
Tam n. Nguyen
26
0
0
14 Apr 2025
Feature-Aware Malicious Output Detection and Mitigation
Feature-Aware Malicious Output Detection and Mitigation
Weilong Dong
Peiguang Li
Yu Tian
Xinyi Zeng
Fengdi Li
Sirui Wang
AAML
24
0
0
12 Apr 2025
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups
Rijul Magu
Arka Dutta
Sean Kim
Ashiqur R. KhudaBukhsh
Munmun De Choudhury
19
0
0
08 Apr 2025
A Systematic Review of Open Datasets Used in Text-to-Image (T2I) Gen AI Model Safety
Rakeen Rouf
Trupti Bavalatti
Osama Ahmed
Dhaval Potdar
Faraz Jawed
EGVM
58
1
0
23 Feb 2025
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Guangsheng Bao
Yanbin Zhao
Juncai He
Yue Zhang
VLM
94
2
0
20 Feb 2025
AI Mimicry and Human Dignity: Chatbot Use as a Violation of Self-Respect
Jan-Willem van der Rijt
Dimitri Coelho Mollo
Bram Vaassen
SILM
44
0
0
17 Feb 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Cristian Gutierrez
LRM
62
0
0
17 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAML
AuLLM
60
1
0
02 Feb 2025
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
Peter Hall
Olivia Mundahl
Sunoo Park
71
0
0
30 Jan 2025
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
Yibin Wang
H. Shi
Ligong Han
Dimitris N. Metaxas
Hao Wang
BDL
UQLM
110
6
0
28 Jan 2025
Episodic memory in AI agents poses risks that should be studied and mitigated
Episodic memory in AI agents poses risks that should be studied and mitigated
Chad DeChant
57
2
0
20 Jan 2025
Two Types of AI Existential Risk: Decisive and Accumulative
Two Types of AI Existential Risk: Decisive and Accumulative
Atoosa Kasirzadeh
57
14
0
20 Jan 2025
Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis
Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis
Lanling Xu
Junjie Zhang
Bingqian Li
Jinpeng Wang
Sheng Chen
Wayne Xin Zhao
Ji-Rong Wen
74
18
0
17 Jan 2025
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Roberto-Rafael Maura-Rivero
Chirag Nagpal
Roma Patel
Francesco Visin
46
1
0
08 Jan 2025
LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena
LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena
Stefan Pasch
38
0
0
04 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
146
0
0
31 Dec 2024
Social Science Is Necessary for Operationalizing Socially Responsible Foundation Models
Social Science Is Necessary for Operationalizing Socially Responsible Foundation Models
Adam Davies
Elisa Nguyen
Michael Simeone
Erik Johnston
Martin Gubri
90
0
0
20 Dec 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
54
1
0
03 Nov 2024
Smaller Large Language Models Can Do Moral Self-Correction
Smaller Large Language Models Can Do Moral Self-Correction
Guangliang Liu
Zhiyu Xue
Rongrong Wang
K. Johnson
Kristen Marie Johnson
LRM
23
0
0
30 Oct 2024
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng
Hengquan Guo
Jiawei Zhang
Dongqing Zou
Ziyu Shao
Honghao Wei
Xin Liu
39
0
0
25 Oct 2024
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
Jahyun Koo
Yerin Hwang
Yongil Kim
Taegwan Kang
Hyunkyung Bae
Kyomin Jung
55
0
0
25 Oct 2024
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Philipp Guldimann
Alexander Spiridonov
Robin Staab
Nikola Jovanović
Mark Vero
...
Mislav Balunović
Nikola Konstantinov
Pavol Bielik
Petar Tsankov
Martin Vechev
ELM
45
4
0
10 Oct 2024
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level
Xinyi Zeng
Yuying Shang
Yutao Zhu
Jingyuan Zhang
Yu Tian
AAML
116
2
0
09 Oct 2024
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion
Guanchu Wang
Yu-Neng Chuang
Ruixiang Tang
Shaochen Zhong
Jiayi Yuan
...
Zirui Liu
V. Chaudhary
Shuai Xu
James Caverlee
Xia Hu
PILM
73
1
0
06 Oct 2024
From Pixels to Personas: Investigating and Modeling
  Self-Anthropomorphism in Human-Robot Dialogues
From Pixels to Personas: Investigating and Modeling Self-Anthropomorphism in Human-Robot Dialogues
Yu Li
Devamanyu Hazarika
Di Jin
Julia Hirschberg
Yang Liu
26
0
0
04 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
82
1
0
02 Oct 2024
Moral Alignment for LLM Agents
Moral Alignment for LLM Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
45
0
0
02 Oct 2024
Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering
Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering
K. K.
Bernhard Schölkopf
Michael Muehlebach
26
0
0
02 Oct 2024
Differentially Private Kernel Density Estimation
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
66
3
0
03 Sep 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang
Philip H. S. Torr
Mohamed Elhoseiny
Adel Bibi
71
9
0
27 Aug 2024
Thorns and Algorithms: Navigating Generative AI Challenges Inspired by
  Giraffes and Acacias
Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias
Waqar Hussain
38
0
0
16 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
50
5
0
11 Jul 2024
AI Alignment through Reinforcement Learning from Human Feedback?
  Contradictions and Limitations
AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations
Adam Dahlgren Lindstrom
Leila Methnani
Lea Krause
Petter Ericson
Ínigo Martínez de Rituerto de Troya
Dimitri Coelho Mollo
Roel Dobbe
ALM
31
2
0
26 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
79
23
0
17 Jun 2024
Teaching Language Models to Self-Improve by Learning from Language
  Feedback
Teaching Language Models to Self-Improve by Learning from Language Feedback
Chi Hu
Yimin Hu
Hang Cao
Tong Xiao
Jingbo Zhu
LRM
VLM
33
4
0
11 Jun 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
58
12
0
28 May 2024
ReMoDetect: Reward Models Recognize Aligned LLM's Generations
ReMoDetect: Reward Models Recognize Aligned LLM's Generations
Hyunseok Lee
Jihoon Tack
Jinwoo Shin
DeLMO
38
0
0
27 May 2024
ChatGPT Code Detection: Techniques for Uncovering the Source of Code
ChatGPT Code Detection: Techniques for Uncovering the Source of Code
Marc Oedingen
Raphael C. Engelhardt
Robin Denz
Maximilian Hammer
Wolfgang Konen
DeLMO
37
8
0
24 May 2024
Synthetic Data Generation for Intersectional Fairness by Leveraging
  Hierarchical Group Structure
Synthetic Data Generation for Intersectional Fairness by Leveraging Hierarchical Group Structure
Gaurav Maheshwari
A. Bellet
Pascal Denis
Mikaela Keller
46
1
0
23 May 2024
Can AI Relate: Testing Large Language Model Response for Mental Health
  Support
Can AI Relate: Testing Large Language Model Response for Mental Health Support
Saadia Gabriel
Isha Puri
Xuhai Xu
Matteo Malgaroli
Marzyeh Ghassemi
LM&MA
AI4MH
27
11
0
20 May 2024
1234
Next