ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.14840
  4. Cited By
Identifying and Mitigating the Security Risks of Generative AI

Identifying and Mitigating the Security Risks of Generative AI

28 August 2023
Clark W. Barrett
Bradley L Boyd
Ellie Burzstein
Nicholas Carlini
Brad Chen
Jihye Choi
A. Chowdhury
Mihai Christodorescu
Anupam Datta
S. Feizi
Kathleen Fisher
Tatsunori Hashimoto
Dan Hendrycks
S. Jha
Daniel Kang
Florian Kerschbaum
E. Mitchell
John C. Mitchell
Zulfikar Ramzan
Khawaja Shams
D. Song
Ankur Taly
Diyi Yang
    SILM
ArXivPDFHTML

Papers citing "Identifying and Mitigating the Security Risks of Generative AI"

25 / 25 papers shown
Title
An End-to-End Model For Logits Based Large Language Models Watermarking
An End-to-End Model For Logits Based Large Language Models Watermarking
Kahim Wong
Jicheng Zhou
Jiantao Zhou
Yain-Whar Si
WaLM
30
2
0
05 May 2025
Frontier AI's Impact on the Cybersecurity Landscape
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Yujin Potter
Tianneng Shi
Zhun Wang
Andy Zhang
Dawn Song
50
1
0
07 Apr 2025
The Call for Socially Aware Language Technologies
The Call for Socially Aware Language Technologies
Diyi Yang
Dirk Hovy
David Jurgens
Barbara Plank
VLM
43
11
0
24 Feb 2025
Secure and Efficient Watermarking for Latent Diffusion Models in Model Distribution Scenarios
Secure and Efficient Watermarking for Latent Diffusion Models in Model Distribution Scenarios
Liangqi Lei
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
WIGM
58
0
0
18 Feb 2025
PDA: Generalizable Detection of AI-Generated Images via Post-hoc Distribution Alignment
PDA: Generalizable Detection of AI-Generated Images via Post-hoc Distribution Alignment
Li Wang
Wenyu Chen
Zheng Li
Shanqing Guo
34
0
0
15 Feb 2025
A Bias-Free Training Paradigm for More General AI-generated Image Detection
A Bias-Free Training Paradigm for More General AI-generated Image Detection
Fabrizio Guillaro
Giada Zingarini
Ben Usman
Avneesh Sud
D. Cozzolino
L. Verdoliva
DiffM
59
3
0
23 Dec 2024
Conceptwm: A Diffusion Model Watermark for Concept Protection
Liangqi Lei
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
WIGM
74
2
0
18 Nov 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
35
2
0
23 Oct 2024
CLEAR: Character Unlearning in Textual and Visual Modalities
CLEAR: Character Unlearning in Textual and Visual Modalities
Alexey Dontsov
Dmitrii Korzh
Alexey Zhavoronkin
Boris Mikheev
Denis Bobkov
Aibek Alanov
Oleg Y. Rogov
Ivan V. Oseledets
Elena Tutubalina
AILaw
VLM
MU
66
5
0
23 Oct 2024
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Chongyu Fan
Jiancheng Liu
Licong Lin
Jinghan Jia
Ruiqi Zhang
Song Mei
Sijia Liu
MU
41
15
0
09 Oct 2024
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
Yibo Miao
Yifan Zhu
Yinpeng Dong
Lijia Yu
Jun Zhu
Xiao-Shan Gao
EGVM
31
12
0
08 Jul 2024
Securing the Future of GenAI: Policy and Technology
Securing the Future of GenAI: Policy and Technology
Mihai Christodorescu
Craven
S. Feizi
Neil Zhenqiang Gong
Mia Hoffmann
...
Jessica Newman
Emelia Probasco
Yanjun Qi
Khawaja Shams
Turek
SILM
26
3
0
21 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
24
36
0
06 May 2024
SOUL: Unlocking the Power of Second-Order Optimization for LLM
  Unlearning
SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
Jinghan Jia
Yihua Zhang
Yimeng Zhang
Jiancheng Liu
Bharat Runwal
James Diffenderfer
B. Kailkhura
Sijia Liu
MU
24
31
0
28 Apr 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
47
36
0
14 Feb 2024
Kernel PCA for Out-of-Distribution Detection
Kernel PCA for Out-of-Distribution Detection
Kun Fang
Qinghua Tao
Kexin Lv
M. He
Xiaolin Huang
Jie-jin Yang
OODD
39
2
0
05 Feb 2024
Large Scale Foundation Models for Intelligent Manufacturing
  Applications: A Survey
Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey
Haotian Zhang
S. D. Semujju
Zhicheng Wang
Xianwei Lv
Kang Xu
...
Jing Wu
Zhuo Long
Wensheng Liang
Xiaoguang Ma
Ruiyan Zhuang
UQCV
AI4TS
AI4CE
27
4
0
11 Dec 2023
Warfare:Breaking the Watermark Protection of AI-Generated Content
Warfare:Breaking the Watermark Protection of AI-Generated Content
Guanlin Li
Yifei Chen
Jie M. Zhang
Shangwei Guo
Shangwei Guo
Tianwei Zhang
Jiwei Li
Tianwei Zhang
WIGM
53
3
0
27 Sep 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
229
2,413
0
06 Oct 2022
Data Feedback Loops: Model-driven Amplification of Dataset Biases
Data Feedback Loops: Model-driven Amplification of Dataset Biases
Rohan Taori
Tatsunori B. Hashimoto
64
42
0
08 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Teaching language models to support answers with verified quotes
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
229
255
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Challenges in Detoxifying Language Models
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
242
191
0
15 Sep 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,798
0
14 Dec 2020
1