ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.12867
  4. Cited By
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and
  Toxicity

Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity

30 January 2023
Terry Yue Zhuo
Yujin Huang
Chunyang Chen
Zhenchang Xing
    SILM
ArXivPDFHTML

Papers citing "Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity"

20 / 20 papers shown
Title
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
93
151
0
28 Jan 2025
PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles
PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles
Li Siyan
Vethavikashini Chithrra Raghuram
Omar Khattab
Julia Hirschberg
Zhou Yu
21
7
0
22 Oct 2024
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction
  Amplification
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang
Yicong Tan
Yun Shen
Ahmed Salem
Michael Backes
Savvas Zannettou
Yang Zhang
LLMAG
AAML
40
14
0
30 Jul 2024
Unmasking and Quantifying Racial Bias of Large Language Models in
  Medical Report Generation
Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation
Yifan Yang
Xiaoyu Liu
Qiao Jin
Furong Huang
Zhiyong Lu
17
23
0
25 Jan 2024
From Words and Exercises to Wellness: Farsi Chatbot for Self-Attachment
  Technique
From Words and Exercises to Wellness: Farsi Chatbot for Self-Attachment Technique
Sina Elahimanesh
Shayan Salehi
Sara Zahedi Movahed
Lisa Alazraki
Ruoyu Hu
Abbas Edalat
19
0
0
13 Oct 2023
A Survey on Fairness in Large Language Models
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
37
59
0
20 Aug 2023
Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency
  in coding algorithms and data structures
Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures
Sayed Erfan Arefin
T. Ashrafi
H. Al-Qudah
Ynes Ineza
Abdul Serwadda
ELM
23
6
0
10 Jul 2023
DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of
  Machine-Generated Text
DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text
Jinyan Su
Terry Yue Zhuo
Di Wang
Preslav Nakov
DeLMO
22
121
0
23 May 2023
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
245
1,071
0
05 Oct 2022
CATER: Intellectual Property Protection on Text Generation APIs via
  Conditional Watermarks
CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
Xuanli He
Qiongkai Xu
Yi Zeng
Lingjuan Lyu
Fangzhao Wu
Jiwei Li
R. Jia
WaLM
177
71
0
19 Sep 2022
NL-Augmenter: A Framework for Task-Sensitive Natural Language
  Augmentation
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole
Varun Gangal
Sebastian Gehrmann
Aadesh Gupta
Zhenhao Li
...
Tianbao Xie
Usama Yaseen
Michael A. Yee
Jing Zhang
Yue Zhang
169
86
0
06 Dec 2021
Protecting Intellectual Property of Language Generation APIs with
  Lexical Watermark
Protecting Intellectual Property of Language Generation APIs with Lexical Watermark
Xuanli He
Qiongkai Xu
Lingjuan Lyu
Fangzhao Wu
Chenguang Wang
WaLM
172
93
0
05 Dec 2021
Fast Model Editing at Scale
Fast Model Editing at Scale
E. Mitchell
Charles Lin
Antoine Bosselut
Chelsea Finn
Christopher D. Manning
KELM
219
341
0
21 Oct 2021
BBQ: A Hand-Built Bias Benchmark for Question Answering
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
212
367
0
15 Oct 2021
Towards Continual Knowledge Learning of Language Models
Towards Continual Knowledge Learning of Language Models
Joel Jang
Seonghyeon Ye
Sohee Yang
Joongbo Shin
Janghoon Han
Gyeonghun Kim
Stanley Jungkyu Choi
Minjoon Seo
CLL
KELM
222
150
0
07 Oct 2021
Learning to Prompt for Vision-Language Models
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
322
2,249
0
02 Sep 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
237
588
0
14 Jul 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,808
0
14 Dec 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
Generating Natural Language Adversarial Examples
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
243
914
0
21 Apr 2018
1