ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.07124
  4. Cited By
RAIN: Your Language Models Can Align Themselves without Finetuning
v1v2 (latest)

RAIN: Your Language Models Can Align Themselves without Finetuning

International Conference on Learning Representations (ICLR), 2023
13 September 2023
Yuhui Li
Fangyun Wei
Jinjing Zhao
Chao Zhang
Hongyang R. Zhang
    SILM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "RAIN: Your Language Models Can Align Themselves without Finetuning"

50 / 114 papers shown
SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security
SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security
Wei Zhao
Zhe Li
Jun Sun
AAML
196
0
0
04 Dec 2025
Factors That Support Grounded Responses in LLM Conversations: A Rapid Review
Factors That Support Grounded Responses in LLM Conversations: A Rapid Review
Gabriele Cesar Iwashima
Claudia Susie Rodrigues
Claudio Dipolitto
Geraldo Xexéo
95
0
0
24 Nov 2025
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Gil Goren
Shahar Katz
Lior Wolf
AAML
236
2
0
15 Nov 2025
Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space
Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space
Sekitoshi Kanai
Tsukasa Yoshida
Hiroshi Takahashi
Haru Kuroki
Kazumune Hashimoto
145
0
0
30 Oct 2025
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong
Shuya Feng
Nima Naderloui
Shenao Yan
Jingyu Zhang
Biying Liu
Ali Arastehfard
Heqing Huang
Yuan Hong
AAML
296
2
0
17 Oct 2025
Proactive defense against LLM Jailbreak
Proactive defense against LLM Jailbreak
Weiliang Zhao
Jinjun Peng
Daniel Ben-Levi
Zhou Yu
Junfeng Yang
AAML
203
2
0
06 Oct 2025
Kwai Keye-VL 1.5 Technical Report
Kwai Keye-VL 1.5 Technical Report
Biao Yang
Bin Wen
Boyang Ding
Changyi Liu
Chenglong Chu
...
S. Wang
X. Luo
Yan Li
Yuhang Hu
Zixing Zhang
VLM
377
32
0
01 Sep 2025
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
Xiangman Li
Xiaodong Wu
Qi Li
Jianbing Ni
Rongxing Lu
AAMLMUKELM
118
1
0
21 Aug 2025
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Sajib Biswas
Mao Nishino
Samuel Jacob Chacko
Xiuwen Liu
AAML
225
2
0
20 Aug 2025
A Survey on Training-free Alignment of Large Language Models
A Survey on Training-free Alignment of Large Language Models
Birong Pan
Yongqi Li
Jiasheng Si
Sibo Wei
Mayi Xu
Shen Zhou
Yuanyuan Zhu
Ming Zhong
T. Qian
3DVLM&MA
533
2
0
12 Aug 2025
P-Aligner: Enabling Pre-Alignment of Language Models via Principled Instruction Synthesis
P-Aligner: Enabling Pre-Alignment of Language Models via Principled Instruction Synthesis
Feifan Song
Bofei Gao
Yifan Song
Yi Liu
Weimin Xiong
Yuyang Song
Tianyu Liu
Guoyin Wang
Houfeng Wang
ALMLLMSV
225
1
0
06 Aug 2025
PUZZLED: Jailbreaking LLMs through Word-Based Puzzles
PUZZLED: Jailbreaking LLMs through Word-Based Puzzles
Yelim Ahn
Jaejin Lee
AAML
86
1
0
02 Aug 2025
SDD: Self-Degraded Defense against Malicious Fine-tuning
SDD: Self-Degraded Defense against Malicious Fine-tuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
ZiXuan Chen
Weikai Lu
Xin Lin
Ziqian Zeng
AAML
201
7
0
27 Jul 2025
PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization
PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization
Han Jiang
Dongyao Zhu
Zhihua Wei
Xiaoyuan Yi
Ziang Xiao
Xing Xie
283
1
0
22 Jul 2025
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Hiroshi Matsuda
Chunpeng Ma
Masayuki Asahara
390
6
0
11 Jun 2025
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Feifan Song
Shaohang Wei
Wen Luo
Yuxuan Fan
Tianyu Liu
Guoyin Wang
Houfeng Wang
258
5
0
09 Jun 2025
SafeSteer: Interpretable Safety Steering with Refusal-Evasion in LLMs
SafeSteer: Interpretable Safety Steering with Refusal-Evasion in LLMs
Shaona Ghosh
Amrita Bhattacharjee
Yftah Ziser
Christopher Parisien
LLMSV
381
8
0
01 Jun 2025
Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment
Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment
Kundan Krishna
Joseph Y Cheng
Charles Maalouf
Leon A Gatys
354
2
0
30 May 2025
LLM Agents Should Employ Security Principles
LLM Agents Should Employ Security Principles
Kaiyuan Zhang
Zian Su
Pin-Yu Chen
E. Bertino
Xiangyu Zhang
Ninghui Li
LLMAG
403
15
0
29 May 2025
Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models
Token-level Accept or Reject: A Micro Alignment Approach for Large Language ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Y. Zhang
Yu Yu
Bo Tang
Yu Zhu
Chuxiong Sun
...
Jie Hu
Zipeng Xie
Zhiyu Li
Feiyu Xiong
Edward Chung
517
0
0
26 May 2025
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study
Zhexin Zhang
Xian Qi Loye
Victor Shea-Jay Huang
Junxiao Yang
Qi Zhu
...
Fei Mi
Lifeng Shang
Yingkang Wang
Hongning Wang
Shiyu Huang
LRM
384
16
0
21 May 2025
Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models
Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models
Md Rafi Ur Rashid
Vishnu Asutosh Dasu
Ye Wang
Gang Tan
Shagufta Mehnaz
AAMLELM
443
0
0
20 May 2025
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
Kalyan Nakka
Jimmy Dani
Ausmit Mondal
Nitesh Saxena
AAML
282
0
0
08 May 2025
What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token PatternsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Michael A. Hedderich
Anyi Wang
Raoyuan Zhao
Florian Eichin
Jonas Fischer
Barbara Plank
405
4
0
22 Apr 2025
Geneshift: Impact of different scenario shift on Jailbreaking LLM
Geneshift: Impact of different scenario shift on Jailbreaking LLM
Tianyi Wu
Zhiwei Xue
Yue Liu
Jiaheng Zhang
Bryan Hooi
See-Kiong Ng
398
2
0
10 Apr 2025
A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models
A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models
Zhouhang Xie
Junda Wu
Yiran Shen
Yu Xia
Xintong Li
...
Sachin Kumar
Bodhisattwa Prasad Majumder
Jingbo Shang
Prithviraj Ammanabrolu
Julian McAuley
501
10
0
09 Apr 2025
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
Yiwei Chen
Yuguang Yao
Yihua Zhang
Bingquan Shen
Gaowen Liu
Sijia Liu
AAMLMU
468
9
0
14 Mar 2025
Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Wenhui Zhang
Huiyu Xu
Peng Kuang
Zeqing He
Ziqi Zhu
Kui Ren
AAMLPILM
260
5
0
09 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
565
3
0
06 Mar 2025
Test-Time Alignment for Large Language Models via Textual Model Predictive Control
Test-Time Alignment for Large Language Models via Textual Model Predictive Control
Kuang-Da Wang
Teng-Ruei Chen
Yu-Heng Hung
Shuoyang Ding
Yueh-Hua Wu
Yu-Chun Wang
Chao-Han Huck Yang
Chao-Han Huck Yang
Wen-Chih Peng
Ping-Chun Hsieh
401
0
0
28 Feb 2025
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Zixuan Weng
Xiaolong Jin
Jinyuan Jia
Xinsong Zhang
AAML
873
22
0
27 Feb 2025
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMsInternational Conference on Learning Representations (ICLR), 2025
Zhaowei Zhang
Fengshuo Bai
Qizhi Chen
Chengdong Ma
Mingzhi Wang
Haoran Sun
Zilong Zheng
Wenbo Ding
703
23
0
26 Feb 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
Volkan Cevher
AAML
343
7
0
24 Feb 2025
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
Zhexin Zhang
Leqi Lei
Junxiao Yang
Xijie Huang
Yida Lu
...
Xianqi Lei
Changzai Pan
Lei Sha
Han Wang
Shiyu Huang
AAML
275
11
0
24 Feb 2025
Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment
Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment
Somnath Banerjee
Sayan Layek
Pratyush Chatterjee
Animesh Mukherjee
Rima Hazra
LLMSV
436
5
0
16 Feb 2025
Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions
Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions
Jingxin Xu
Guoshun Nan
Sheng Guan
Sicong Leng
Wenshu Fan
Zixiao Wang
Yuyang Ma
Zhili Zhou
Yanzhao Hou
Xiaofeng Tao
LM&MA
368
2
0
08 Feb 2025
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided SearchNeural Information Processing Systems (NeurIPS), 2024
Xuan Chen
Yuzhou Nie
Wenbo Guo
Xiangyu Zhang
465
50
0
28 Jan 2025
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity ConstraintsAAAI Conference on Artificial Intelligence (AAAI), 2025
Jonathan Nöther
Adish Singla
Goran Radanović
AAML
472
4
0
14 Jan 2025
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack DefenseNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yang Ouyang
Hengrui Gu
Shuhang Lin
Qingfeng Lan
Jie Peng
B. Kailkhura
Tianlong Chen
Kaixiong Zhou
Kaixiong Zhou
AAML
371
10
0
05 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELMAILaw
1.3K
388
0
25 Nov 2024
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Xinyan Guan
Yanjiang Liu
Xinyu Lu
Boxi Cao
Xianpei Han
...
Le Sun
Jie Lou
Bowen Yu
Yaojie Lu
Hongyu Lin
ALM
647
9
0
18 Nov 2024
Dynamic Rewarding with Prompt Optimization Enables Tuning-free
  Self-Alignment of Language Models
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Somanshu Singla
Zhen Wang
Tianyang Liu
Abdullah Ashfaq
Zhiting Hu
Eric Xing
374
14
0
13 Nov 2024
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
SQL Injection Jailbreak: A Structural Disaster of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jiawei Zhao
Kejiang Chen
Weinan Zhang
Nenghai Yu
AAML
675
8
0
03 Nov 2024
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng
Hengquan Guo
Jiawei Zhang
Dongqing Zou
Ziyu Shao
Honghao Wei
Xin Liu
391
6
0
25 Oct 2024
Adversarial Attacks on Large Language Models Using Regularized
  Relaxation
Adversarial Attacks on Large Language Models Using Regularized Relaxation
Samuel Jacob Chacko
Sajib Biswas
Chashi Mahiul Islam
Fatema Tabassum Liza
Xiuwen Liu
AAML
277
10
0
24 Oct 2024
LLMScan: Causal Scan for LLM Misbehavior Detection
LLMScan: Causal Scan for LLM Misbehavior Detection
Mengdi Zhang
Kai Kiat Goh
Peixin Zhang
Jun Sun
Rose Lin Xin
Hongyu Zhang
740
6
0
22 Oct 2024
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu
Yifu Lu
Yifan Zeng
Jiacheng Guo
Jiayi Geng
...
Ling Yang
Mengdi Wang
Kaixuan Huang
Yue Wu
Mengdi Wang
582
57
0
18 Oct 2024
SPIN: Self-Supervised Prompt INjection
SPIN: Self-Supervised Prompt INjection
Leon Zhou
Junfeng Yang
Chengzhi Mao
AAMLSILM
291
1
0
17 Oct 2024
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent
  Enhanced Explanation Evaluation Framework
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
Fan Liu
Yue Feng
Zhao Xu
Lixin Su
Xinyu Ma
D. Yin
Hao Liu
ELM
348
39
0
11 Oct 2024
FlipAttack: Jailbreak LLMs via Flipping
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
270
55
0
02 Oct 2024
123
Next
Page 1 of 3