ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.18018
  4. Cited By
On Prompt-Driven Safeguarding for Large Language Models

On Prompt-Driven Safeguarding for Large Language Models

31 January 2024
Chujie Zheng
Fan Yin
Hao Zhou
Fandong Meng
Jie Zhou
Kai-Wei Chang
Minlie Huang
Nanyun Peng
    AAML
ArXivPDFHTML

Papers citing "On Prompt-Driven Safeguarding for Large Language Models"

13 / 13 papers shown
Title
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
Zhehao Zhang
Weijie Xu
Fanyou Wu
Chandan K. Reddy
14
0
0
12 May 2025
Sun-Shine: A Large Language Model for Tibetan Culture
Sun-Shine: A Large Language Model for Tibetan Culture
Cheng Huang
Fan Gao
Nyima Tashi
Yutong Liu
Xiangxiang Wang
...
Gadeng Luosang
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Yongbin Yu
ALM
83
2
0
24 Mar 2025
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu
Haoyu Zhao
Xinran Gu
Dingli Yu
Anirudh Goyal
Sanjeev Arora
ALM
75
41
0
20 Jan 2025
Yi-Lightning Technical Report
Yi-Lightning Technical Report
01. AI
:
Alan Wake
Albert Wang
Bei Chen
...
Yuxuan Sha
Zhaodong Yan
Zhiyuan Liu
Zirui Zhang
Zonghong Dai
OSLM
97
3
0
02 Dec 2024
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao
Xiang Zheng
Lin Luo
Yige Li
Xingjun Ma
Yu-Gang Jiang
VLM
AAML
50
3
0
28 Oct 2024
On the Role of Attention Heads in Large Language Model Safety
On the Role of Attention Heads in Large Language Model Safety
Z. Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Junfeng Fang
Yongbin Li
48
5
0
17 Oct 2024
Improving Instruction-Following in Language Models through Activation Steering
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
45
13
0
15 Oct 2024
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang
Chengzhi Hu
Paul Röttger
Barbara Plank
46
5
0
04 Oct 2024
Robust LLM safeguarding via refusal feature adversarial training
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
49
9
0
30 Sep 2024
Personality Alignment of Large Language Models
Personality Alignment of Large Language Models
Minjun Zhu
Linyi Yang
Yue Zhang
Yue Zhang
ALM
44
5
0
21 Aug 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
64
6
0
08 Jun 2024
Exploring Prompt-based Few-shot Learning for Grounded Dialog Generation
Exploring Prompt-based Few-shot Learning for Grounded Dialog Generation
Chujie Zheng
Minlie Huang
51
44
0
14 Sep 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
278
3,784
0
18 Apr 2021
1