ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.13916
  4. Cited By
Unsolved Problems in ML Safety

Unsolved Problems in ML Safety

28 September 2021
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
ArXivPDFHTML

Papers citing "Unsolved Problems in ML Safety"

10 / 10 papers shown
Title
What Is AI Safety? What Do We Want It to Be?
What Is AI Safety? What Do We Want It to Be?
Jacqueline Harding
Cameron Domenico Kirk-Giannini
11
0
0
05 May 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli
Kentaroh Toyoda
Yuan Wang
Leon Witt
Muhammad Asif Ali
Yukai Miao
Dan Li
Qingsong Wei
UQCV
55
0
0
25 Apr 2025
Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models
Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models
Tri Nguyen
Lohith Srikanth Pentapalli
Magnus Sieverding
Laurah Turner
Seth Overla
...
Michael Gharib
Matt Kelleher
Michael Shukis
Cameron Pawlik
Kelly Cohen
9
0
0
21 Apr 2025
Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment
Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment
Qizhang Feng
Siva Rajesh Kasa
Santhosh Kumar Kasa
Hyokun Yun
C. Teo
S. Bodapati
50
5
0
08 Jul 2024
Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and
  Results
Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results
Mohamad H. Danesh
Alan Fern
79
10
0
11 Jul 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
260
4,299
0
29 Apr 2021
Measuring and Improving Consistency in Pretrained Language Models
Measuring and Improving Consistency in Pretrained Language Models
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Abhilasha Ravichander
Eduard H. Hovy
Hinrich Schütze
Yoav Goldberg
HILM
226
273
0
01 Feb 2021
RobustBench: a standardized adversarial robustness benchmark
RobustBench: a standardized adversarial robustness benchmark
Francesco Croce
Maksym Andriushchenko
Vikash Sehwag
Edoardo Debenedetti
Nicolas Flammarion
M. Chiang
Prateek Mittal
Matthias Hein
VLM
190
554
0
19 Oct 2020
AI safety via debate
AI safety via debate
G. Irving
Paul Christiano
Dario Amodei
183
148
0
02 May 2018
Simple and Scalable Predictive Uncertainty Estimation using Deep
  Ensembles
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Balaji Lakshminarayanan
Alexander Pritzel
Charles Blundell
UQCV
BDL
251
4,940
0
05 Dec 2016
1