ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.10899
  4. Cited By
Concrete Problems in AI Safety, Revisited

Concrete Problems in AI Safety, Revisited

18 December 2023
Inioluwa Deborah Raji
Roel Dobbe
ArXivPDFHTML

Papers citing "Concrete Problems in AI Safety, Revisited"

7 / 7 papers shown
Title
Toward an Evaluation Science for Generative AI Systems
Laura Weidinger
Deb Raji
Hanna M. Wallach
Margaret Mitchell
Angelina Wang
Olawale Salaudeen
Rishi Bommasani
Sayash Kapoor
Deep Ganguli
Sanmi Koyejo
EGVM
ELM
72
4
0
07 Mar 2025
AI Alignment through Reinforcement Learning from Human Feedback?
  Contradictions and Limitations
AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations
Adam Dahlgren Lindstrom
Leila Methnani
Lea Krause
Petter Ericson
Ínigo Martínez de Rituerto de Troya
Dimitri Coelho Mollo
Roel Dobbe
ALM
50
2
0
26 Jun 2024
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Xiaochen Li
Zheng-Xin Yong
Stephen H. Bach
CLL
34
14
0
23 Jun 2024
Aligning Large Language Models from Self-Reference AI Feedback with one
  General Principle
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
Rong Bao
Rui Zheng
Shihan Dou
Xiao Wang
Enyu Zhou
Bo Wang
Qi Zhang
Liang Ding
Dacheng Tao
ALM
50
0
0
17 Jun 2024
AI Risk Management Should Incorporate Both Safety and Security
AI Risk Management Should Incorporate Both Safety and Security
Xiangyu Qi
Yangsibo Huang
Yi Zeng
Edoardo Debenedetti
Jonas Geiping
...
Chaowei Xiao
Bo-wen Li
Dawn Song
Peter Henderson
Prateek Mittal
AAML
53
11
0
29 May 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
73
30
0
02 Feb 2024
System Safety and Artificial Intelligence
System Safety and Artificial Intelligence
Roel Dobbe
25
34
0
18 Feb 2022
1