ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.08145
  4. Cited By
Refusal Behavior in Large Language Models: A Nonlinear Perspective

Refusal Behavior in Large Language Models: A Nonlinear Perspective

14 January 2025
Fabian Hildebrandt
Andreas K. Maier
Patrick Krauss
A. Schilling
ArXiv (abs)PDFHTMLGithub

Papers citing "Refusal Behavior in Large Language Models: A Nonlinear Perspective"

6 / 6 papers shown
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Gil Goren
Shahar Katz
Lior Wolf
AAML
249
2
0
15 Nov 2025
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs
Md Abdullah Al Mamun
Ihsen Alouani
Nael B. Abu-Ghazaleh
128
1
0
28 Aug 2025
The Geometry of Harmfulness in LLMs through Subconcept Probing
The Geometry of Harmfulness in LLMs through Subconcept Probing
McNair Shah
Saleena Angeline
Adhitya Rajendra Kumar
Naitik Chheda
Kevin Zhu
Sean O Brien
Sean O'Brien
Will Cai
LLMSV
315
4
0
23 Jul 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MAAI4CE
400
6
0
05 Jun 2025
From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law
From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law
John Mavi
Diana Teodora Găitan
Sergio Coronado
235
0
0
05 Jun 2025
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
Stanley Yu
Vaidehi Bulusu
Oscar Yasunaga
Clayton Lau
Cole Blondin
Sean O'Brien
Kevin Zhu
Sean O Brien
251
2
0
27 May 2025
1
Page 1 of 1