ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.15710
  4. Cited By
Advancing LLM Safe Alignment with Safety Representation Ranking

Advancing LLM Safe Alignment with Safety Representation Ranking

21 May 2025
Tianqi Du
Zeming Wei
Quan Chen
Chenheng Zhang
Yisen Wang
    ALM
ArXiv (abs)PDFHTML

Papers citing "Advancing LLM Safe Alignment with Safety Representation Ranking"

6 / 6 papers shown
Language Ranker: A Lightweight Ranking framework for LLM Decoding
Language Ranker: A Lightweight Ranking framework for LLM Decoding
Chenheng Zhang
Tianqi Du
Jizhe Zhang
Mingqing Xiao
Yifei Wang
Yisen Wang
Zhouchen Lin
ALM
190
0
0
23 Oct 2025
AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software
AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software
Rui Yang
Michael Fu
Chakkrit Tantithamthavorn
Chetan Arora
Gunel Gulmammadova
Joey Chua
137
0
0
21 Sep 2025
ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs
ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs
Zeming Wei
Chengcan Wu
Meng Sun
215
3
0
02 Jun 2025
LiPO: Listwise Preference Optimization through Learning-to-Rank
LiPO: Listwise Preference Optimization through Learning-to-RankNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
601
84
0
28 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALMLRM
928
571
0
03 Jan 2025
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie
Xiangyu Qi
Yi Zeng
Yangsibo Huang
Udari Madhushani Sehwag
...
Bo Li
Kai Li
Danqi Chen
Peter Henderson
Prateek Mittal
ALMELM
423
135
0
20 Jun 2024
1