ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.05197
  4. Cited By
Multi-step Jailbreaking Privacy Attacks on ChatGPT

Multi-step Jailbreaking Privacy Attacks on ChatGPT

11 April 2023
Haoran Li
Dadi Guo
Wei Fan
Mingshi Xu
Jie Huang
Fanpu Meng
Yangqiu Song
    SILM
ArXivPDFHTML

Papers citing "Multi-step Jailbreaking Privacy Attacks on ChatGPT"

35 / 235 papers shown
Title
FLIRT: Feedback Loop In-context Red Teaming
FLIRT: Feedback Loop In-context Red Teaming
Ninareh Mehrabi
Palash Goyal
Christophe Dupuy
Qian Hu
Shalini Ghosh
R. Zemel
Kai-Wei Chang
Aram Galstyan
Rahul Gupta
DiffM
16
55
0
08 Aug 2023
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak
  Prompts on Large Language Models
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen
Z. Chen
Michael Backes
Yun Shen
Yang Zhang
SILM
28
241
0
07 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
34
468
0
27 Jul 2023
MasterKey: Automated Jailbreak Across Multiple Large Language Model
  Chatbots
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
Gelei Deng
Yi Liu
Yuekang Li
Kailong Wang
Ying Zhang
Zefeng Li
Haoyu Wang
Tianwei Zhang
Yang Liu
SILM
28
118
0
16 Jul 2023
Evade ChatGPT Detectors via A Single Space
Evade ChatGPT Detectors via A Single Space
Shuyang Cai
Wanyun Cui
DeLMO
31
14
0
05 Jul 2023
Jailbroken: How Does LLM Safety Training Fail?
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
26
827
0
05 Jul 2023
Citation: A Key to Building Responsible and Accountable Large Language
  Models
Citation: A Key to Building Responsible and Accountable Large Language Models
Jie Huang
Kevin Chen-Chuan Chang
HILM
33
16
0
05 Jul 2023
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language
  Models
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models
Yue Huang
Qihui Zhang
Philip S. Y
Lichao Sun
13
46
0
20 Jun 2023
Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest
  Cost
Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest Cost
Juexiao Zhou
Xiuying Chen
Xin Gao
LM&MA
AI4CE
85
12
0
19 Jun 2023
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Stephen Casper
Jason Lin
Joe Kwon
Gatlen Culp
Dylan Hadfield-Menell
AAML
8
83
0
15 Jun 2023
Improving Open Language Models by Learning from Organic Interactions
Improving Open Language Models by Learning from Organic Interactions
Jing Xu
Da Ju
Joshua Lane
M. Komeili
Eric Michael Smith
...
Rashel Moritz
Sainbayar Sukhbaatar
Y-Lan Boureau
Jason Weston
Kurt Shuster
17
8
0
07 Jun 2023
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and
  Evaluation through the Lens of Academic Writing
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing
Zeyan Liu
Zijun Yao
Fengjun Li
Bo Luo
DeLMO
14
17
0
07 Jun 2023
Spear or Shield: Leveraging Generative AI to Tackle Security Threats of
  Intelligent Network Services
Spear or Shield: Leveraging Generative AI to Tackle Security Threats of Intelligent Network Services
Hongyang Du
Dusit Niyato
Jiawen Kang
Zehui Xiong
K. Lam
Ya-Nan Fang
Yonghui Li
AAML
21
13
0
04 Jun 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
  Datasets
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq R. Joty
J. Huang
LM&MA
ELM
ALM
36
175
0
29 May 2023
A Survey on ChatGPT: AI-Generated Contents, Challenges, and Solutions
A Survey on ChatGPT: AI-Generated Contents, Challenges, and Solutions
Yuntao Wang
Yanghe Pan
Miao Yan
Zhou Su
Tom H. Luan
17
140
0
25 May 2023
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting
  Jailbreaks
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks
Abhinav Rao
S. Vashistha
Atharva Naik
Somak Aditya
Monojit Choudhury
17
17
0
24 May 2023
The CoT Collection: Improving Zero-shot and Few-shot Learning of
  Language Models via Chain-of-Thought Fine-Tuning
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Seungone Kim
Se June Joo
Doyoung Kim
Joel Jang
Seonghyeon Ye
Jamin Shin
Minjoon Seo
ALM
RALM
LRM
16
55
0
23 May 2023
Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study
Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study
Yi Liu
Gelei Deng
Zhengzi Xu
Yuekang Li
Yaowen Zheng
Ying Zhang
Lida Zhao
Tianwei Zhang
Kailong Wang
Yang Liu
28
423
0
23 May 2023
On the Risk of Misinformation Pollution with Large Language Models
On the Risk of Misinformation Pollution with Large Language Models
Yikang Pan
Liangming Pan
Wenhu Chen
Preslav Nakov
Min-Yen Kan
W. Wang
DeLMO
190
105
0
23 May 2023
Lion: Adversarial Distillation of Proprietary Large Language Models
Lion: Adversarial Distillation of Proprietary Large Language Models
Yuxin Jiang
Chunkit Chan
Mingyang Chen
Wei Wang
ALM
20
23
0
22 May 2023
Quantifying Association Capabilities of Large Language Models and Its
  Implications on Privacy Leakage
Quantifying Association Capabilities of Large Language Models and Its Implications on Privacy Leakage
Hanyin Shao
Jie Huang
Shen Zheng
Kevin Chen-Chuan Chang
PILM
8
24
0
22 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
27
81
0
19 May 2023
Simulating H.P. Lovecraft horror literature with the ChatGPT large
  language model
Simulating H.P. Lovecraft horror literature with the ChatGPT large language model
E.C. Garrido-Merchán
J. L. Arroyo-Barrigüete
Roberto Gozalo-Brizuela
25
9
0
05 May 2023
ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal,
  Causal, and Discourse Relations
ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations
Chunkit Chan
Cheng Jiayang
Weiqi Wang
Yuxin Jiang
Tianqing Fang
Xin Liu
Yangqiu Song
LRM
78
60
0
28 Apr 2023
SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual
  Large Language Model
SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model
Juexiao Zhou
Xiao-Zhen He
Liyuan Sun
Jiannan Xu
Xiuying Chen
Yuetan Chu
Longxi Zhou
Xingyu Liao
Bin Zhang
Xin Gao
LM&MA
14
23
0
21 Apr 2023
Large AI Models in Health Informatics: Applications, Challenges, and the
  Future
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny P. L. Lo
AI4MH
LM&MA
38
123
0
21 Mar 2023
On the Impossible Safety of Large AI Models
On the Impossible Safety of Large AI Models
El-Mahdi El-Mhamdi
Sadegh Farhadkhani
R. Guerraoui
Nirupam Gupta
L. Hoang
Rafael Pinot
Sébastien Rouault
John Stephan
26
31
0
30 Sep 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
4,048
0
24 May 2022
You Don't Know My Favorite Color: Preventing Dialogue Representations
  from Revealing Speakers' Private Personas
You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas
Haoran Li
Yangqiu Song
Lixin Fan
59
17
0
26 Apr 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
205
1,651
0
15 Oct 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,798
0
14 Dec 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,584
0
21 Jan 2020
Previous
12345