Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.12833
Cited By
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
24 August 2023
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities"
32 / 32 papers shown
Title
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David A. Wagner
AAML
60
0
0
28 Apr 2025
FDLLM: A Text Fingerprint Detection Method for LLMs in Multi-Language, Multi-Domain Black-Box Environments
Zhiyuan Fu
Junfan Chen
Hongyu Sun
Ting Yang
Ruidong Li
Yuqing Zhang
45
0
0
28 Jan 2025
Detecting Training Data of Large Language Models via Expectation Maximization
Gyuwan Kim
Yang Li
Evangelia Spiliopoulou
Jie Ma
Miguel Ballesteros
William Yang Wang
MIALM
85
3
2
10 Oct 2024
Dissecting Fine-Tuning Unlearning in Large Language Models
Yihuai Hong
Yuelin Zou
Lijie Hu
Ziqian Zeng
Di Wang
Haiqin Yang
AAML
MU
24
2
0
09 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
64
1
0
09 Oct 2024
Fine-tuning can Help Detect Pretraining Data from Large Language Models
H. Zhang
Songxin Zhang
Bingyi Jing
Hongxin Wei
34
0
0
09 Oct 2024
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method
Weichao Zhang
Ruqing Zhang
Jiafeng Guo
Maarten de Rijke
Yixing Fan
Xueqi Cheng
20
7
0
23 Sep 2024
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
Tianle Gu
Kexin Huang
Ruilin Luo
Yuanqi Yao
Yujiu Yang
Yan Teng
Yingchun Wang
MU
15
4
0
18 Sep 2024
Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches
Jamal N. Al-Karaki
Muhammad Al-Zafar Khan
Marwan Omar
23
4
0
11 Sep 2024
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding
Cheng Wang
Yiwei Wang
Bryan Hooi
Yujun Cai
Nanyun Peng
Kai-Wei Chang
35
2
0
05 Sep 2024
Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study
Jerson Francia
Derek Hansen
Ben Schooley
Matthew Taylor
Shydra Murray
Greg Snow
18
1
0
18 Jun 2024
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices
Shivani Kapania
Ruiyi Wang
Toby Jia-Jun Li
Tianshi Li
Hong Shen
23
6
0
28 Mar 2024
PROMISE: A Framework for Developing Complex Conversational Interactions (Technical Report)
Wenyuan Wu
Jasmin Heierli
Max Meisterhans
Adrian Moser
Andri Farber
Mateusz Dolata
Elena Gavagnin
Alexandre de Spindler
Gerhard Schwabe
11
0
0
06 Dec 2023
UOR: Universal Backdoor Attacks on Pre-trained Language Models
Wei Du
Peixuan Li
Bo-wen Li
Haodong Zhao
Gongshen Liu
AAML
37
8
0
16 May 2023
Defending against Insertion-based Textual Backdoor Attacks via Attribution
Jiazhao Li
Zhuofeng Wu
Wei Ping
Chaowei Xiao
V. Vydiswaran
40
23
0
03 May 2023
Mitigating Approximate Memorization in Language Models via Dissimilarity Learned Policy
Aly M. Kassem
21
2
0
02 May 2023
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
90
124
0
01 May 2023
Emergent autonomous scientific research capabilities of large language models
Daniil A. Boiko
R. MacKnight
Gabe Gomes
ELM
LM&Ro
AI4CE
LLMAG
101
115
0
11 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
197
2,232
0
22 Mar 2023
Susceptibility to Influence of Large Language Models
Lewis D. Griffin
Bennett Kleinberg
Maximilian Mozes
Kimberly T. Mai
Maria Vau
M. Caldwell
Augustine N. Mavor-Parker
39
14
0
10 Mar 2023
CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
Xuanli He
Qiongkai Xu
Yi Zeng
Lingjuan Lyu
Fangzhao Wu
Jiwei Li
R. Jia
WaLM
163
71
0
19 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
216
327
0
23 Aug 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
205
364
0
15 Oct 2021
Differentially Private Fine-tuning of Language Models
Da Yu
Saurabh Naik
A. Backurs
Sivakanth Gopi
Huseyin A. Inan
...
Y. Lee
Andre Manoel
Lukas Wutschitz
Sergey Yekhanin
Huishuai Zhang
128
258
0
13 Oct 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
274
882
0
18 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
275
3,784
0
18 Apr 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
264
1,798
0
14 Dec 2020
Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification
Chuanshuai Chen
Jiazhu Dai
SILM
48
126
0
11 Jul 2020
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
195
607
0
03 Sep 2019
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
230
909
0
21 Apr 2018
1