Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.11668
Cited By
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
17 June 2024
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Jiayi Mao
Xueqi Cheng
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
""Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak"
9 / 9 papers shown
Title
a1: Steep Test-time Scaling Law via Environment Augmented Generation
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Yuyao Ge
Jun Wan
Yurong Wu
Xueqi Cheng
LRM
15
0
0
20 Apr 2025
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Baolong Bi
Shenghua Liu
Y. Wang
Yilong Xu
Junfeng Fang
Lingrui Mei
Xueqi Cheng
KELM
53
4
0
20 Mar 2025
Context-DPO: Aligning Language Models for Context-Faithfulness
Baolong Bi
Shaohan Huang
Y. Wang
Tianchi Yang
Zihan Zhang
...
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
Shenghua Liu
94
8
0
18 Dec 2024
HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Ruibin Yuan
Xueqi Cheng
26
1
0
03 Oct 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
41
6
0
20 Jul 2024
SLANG: New Concept Comprehension of Large Language Models
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Xueqi Chen
KELM
17
2
0
23 Jan 2024
Insights into Classifying and Mitigating LLMs' Hallucinations
Alessandro Bruno
P. Mazzeo
Aladine Chetouani
M. Tliba
M. A. Kerkouri
HILM
22
6
0
14 Nov 2023
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
135
139
0
16 Oct 2023
Gradient-based Adversarial Attacks against Text Transformers
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
SILM
93
162
0
15 Apr 2021
1