Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.11521
Cited By
Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models
16 August 2023
Zhenhua Wang
Wei Xie
Kai Chen
Baosheng Wang
Zhiwen Gui
Enze Wang
AAML
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models"
6 / 6 papers shown
Title
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
24
36
0
06 May 2024
Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models
Shuai Zhao
Jinming Wen
Anh Tuan Luu
J. Zhao
Jie Fu
SILM
54
88
0
02 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
200
2,232
0
22 Mar 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
216
327
0
23 Aug 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Poisoning the Unlabeled Dataset of Semi-Supervised Learning
Nicholas Carlini
AAML
136
65
0
04 May 2021
1