ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.11668
  4. Cited By
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak

"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak

17 June 2024
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Jiayi Mao
Xueqi Cheng
    AAML
ArXivPDFHTML

Papers citing ""Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak"

9 / 9 papers shown
Title
a1: Steep Test-time Scaling Law via Environment Augmented Generation
a1: Steep Test-time Scaling Law via Environment Augmented Generation
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Yuyao Ge
Jun Wan
Yurong Wu
Xueqi Cheng
LRM
15
0
0
20 Apr 2025
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Baolong Bi
Shenghua Liu
Y. Wang
Yilong Xu
Junfeng Fang
Lingrui Mei
Xueqi Cheng
KELM
53
4
0
20 Mar 2025
Context-DPO: Aligning Language Models for Context-Faithfulness
Context-DPO: Aligning Language Models for Context-Faithfulness
Baolong Bi
Shaohan Huang
Y. Wang
Tianchi Yang
Zihan Zhang
...
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
Shenghua Liu
94
8
0
18 Dec 2024
HiddenGuard: Fine-Grained Safe Generation with Specialized
  Representation Router
HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Ruibin Yuan
Xueqi Cheng
26
1
0
03 Oct 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models
  (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
41
6
0
20 Jul 2024
SLANG: New Concept Comprehension of Large Language Models
SLANG: New Concept Comprehension of Large Language Models
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Xueqi Chen
KELM
17
2
0
23 Jan 2024
Insights into Classifying and Mitigating LLMs' Hallucinations
Insights into Classifying and Mitigating LLMs' Hallucinations
Alessandro Bruno
P. Mazzeo
Aladine Chetouani
M. Tliba
M. A. Kerkouri
HILM
22
6
0
14 Nov 2023
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
135
139
0
16 Oct 2023
Gradient-based Adversarial Attacks against Text Transformers
Gradient-based Adversarial Attacks against Text Transformers
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
SILM
93
162
0
15 Apr 2021
1