ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.10719
  4. Cited By
LLM Censorship: A Machine Learning Challenge or a Computer Security
  Problem?

LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

20 July 2023
David Glukhov
Ilia Shumailov
Y. Gal
Nicolas Papernot
V. Papyan
    AAML
    ELM
ArXivPDFHTML

Papers citing "LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?"

13 / 13 papers shown
Title
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
Marco Arazzi
Vignesh Kumar Kembu
Antonino Nocera
V. P.
78
0
0
30 Apr 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Hannah Cyberey
David E. Evans
LLMSV
72
0
0
23 Apr 2025
What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices
What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices
Sander Noels
Guillaume Bied
Maarten Buyl
Alexander Rogiers
Yousra Fettach
Jefrey Lijffijt
Tijl De Bie
28
0
0
04 Apr 2025
Exploring LLMs for Malware Detection: Review, Framework Design, and
  Countermeasure Approaches
Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches
Jamal N. Al-Karaki
Muhammad Al-Zafar Khan
Marwan Omar
26
4
0
11 Sep 2024
Securing the Future of GenAI: Policy and Technology
Securing the Future of GenAI: Policy and Technology
Mihai Christodorescu
Craven
S. Feizi
Neil Zhenqiang Gong
Mia Hoffmann
...
Jessica Newman
Emelia Probasco
Yanjun Qi
Khawaja Shams
Turek
SILM
26
3
0
21 May 2024
Navigating LLM Ethics: Advancements, Challenges, and Future Directions
Navigating LLM Ethics: Advancements, Challenges, and Future Directions
Junfeng Jiao
S. Afroogh
Yiming Xu
Connor Phillips
AILaw
55
19
0
14 May 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
42
7
0
29 Feb 2024
StruQ: Defending Against Prompt Injection with Structured Queries
StruQ: Defending Against Prompt Injection with Structured Queries
Sizhe Chen
Julien Piet
Chawin Sitawarin
David A. Wagner
SILM
AAML
22
65
0
09 Feb 2024
Demystifying RCE Vulnerabilities in LLM-Integrated Apps
Demystifying RCE Vulnerabilities in LLM-Integrated Apps
Tong Liu
Zizhuang Deng
Guozhu Meng
Yuekang Li
Kai Chen
SILM
29
19
0
06 Sep 2023
The Internal State of an LLM Knows When It's Lying
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
216
297
0
26 Apr 2023
On the Impossible Safety of Large AI Models
On the Impossible Safety of Large AI Models
El-Mahdi El-Mhamdi
Sadegh Farhadkhani
R. Guerraoui
Nirupam Gupta
L. Hoang
Rafael Pinot
Sébastien Rouault
John Stephan
26
31
0
30 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
1