Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2501.08145
Cited By
Refusal Behavior in Large Language Models: A Nonlinear Perspective
14 January 2025
Fabian Hildebrandt
Andreas K. Maier
Patrick Krauss
A. Schilling
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Refusal Behavior in Large Language Models: A Nonlinear Perspective"
6 / 6 papers shown
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Gil Goren
Shahar Katz
Lior Wolf
AAML
249
2
0
15 Nov 2025
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs
Md Abdullah Al Mamun
Ihsen Alouani
Nael B. Abu-Ghazaleh
128
1
0
28 Aug 2025
The Geometry of Harmfulness in LLMs through Subconcept Probing
McNair Shah
Saleena Angeline
Adhitya Rajendra Kumar
Naitik Chheda
Kevin Zhu
Sean O Brien
Sean O'Brien
Will Cai
LLMSV
315
4
0
23 Jul 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MA
AI4CE
400
6
0
05 Jun 2025
From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law
John Mavi
Diana Teodora Găitan
Sergio Coronado
235
0
0
05 Jun 2025
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
Stanley Yu
Vaidehi Bulusu
Oscar Yasunaga
Clayton Lau
Cole Blondin
Sean O'Brien
Kevin Zhu
Sean O Brien
251
2
0
27 May 2025
1
Page 1 of 1