ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.10235
  4. Cited By
Assessing Hidden Risks of LLMs: An Empirical Study on Robustness,
  Consistency, and Credibility

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

15 May 2023
Wen-song Ye
Mingfeng Ou
Tianyi Li
Yipeng Chen
Xuetao Ma
Yifan YangGong
Sai Wu
Jie Fu
Gang Chen
Haobo Wang
J. Zhao
ArXivPDFHTML

Papers citing "Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility"

29 / 29 papers shown
Title
A Survey on Privacy Risks and Protection in Large Language Models
A Survey on Privacy Risks and Protection in Large Language Models
Kang Chen
Xiuze Zhou
Yuanguo Lin
Shibo Feng
Li Shen
Pengcheng Wu
AILaw
PILM
53
0
0
04 May 2025
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
Aryan Shrivastava
Paula Akemi Aoyagui
24
0
0
14 Apr 2025
Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing
Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing
Jihyun Janice Ahn
Wenpeng Yin
SILM
LRM
56
1
0
02 Apr 2025
Mapping the Trust Terrain: LLMs in Software Engineering -- Insights and Perspectives
Mapping the Trust Terrain: LLMs in Software Engineering -- Insights and Perspectives
Dipin Khati
Yijin Liu
David Nader-Palacio
Yixuan Zhang
Denys Poshyvanyk
48
0
0
18 Mar 2025
Waste Not, Want Not; Recycled Gumbel Noise Improves Consistency in Natural Language Generation
Damien de Mijolla
Hannan Saddiq
Kim Moore
54
0
0
02 Mar 2025
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs
Bryan Guan
Tanya Roosta
Peyman Passban
Mehdi Rezagholizadeh
92
0
0
06 Feb 2025
Measuring Free-Form Decision-Making Inconsistency of Language Models in
  Military Crisis Simulations
Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations
Aryan Shrivastava
Jessica Hullman
Max Lamparth
34
6
0
17 Oct 2024
Generalists vs. Specialists: Evaluating Large Language Models for Urdu
Generalists vs. Specialists: Evaluating Large Language Models for Urdu
Samee Arif
Abdul Hameed Azeemi
Agha Ali Raza
Awais Athar
ALM
LM&MA
ELM
33
4
0
05 Jul 2024
Hallucination Detection: Robustly Discerning Reliable Answers in Large
  Language Models
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
Yuyan Chen
Qiang Fu
Yichen Yuan
Zhihao Wen
Ge Fan
Dayiheng Liu
Dongmei Zhang
Zhixu Li
Yanghua Xiao
HILM
40
67
0
04 Jul 2024
NLPerturbator: Studying the Robustness of Code LLMs to Natural Language
  Variations
NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations
Junkai Chen
Zhenhao Li
Xing Hu
Xin Xia
AAML
32
7
0
28 Jun 2024
Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing
Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing
Hadi Askari
Anshuman Chhabra
Muhao Chen
Prasant Mohapatra
26
4
0
06 Jun 2024
Safeguarding Large Language Models: A Survey
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
32
17
0
03 Jun 2024
Data Contamination Calibration for Black-box LLMs
Data Contamination Calibration for Black-box LLMs
Wen-song Ye
Jiaqi Hu
Liyao Li
Haobo Wang
Gang Chen
Junbo Zhao
28
6
0
20 May 2024
Evaluating Consistency and Reasoning Capabilities of Large Language
  Models
Evaluating Consistency and Reasoning Capabilities of Large Language Models
Yash Saxena
Sarthak Chopra
Arunendra Mani Tripathi
ELM
LRM
28
5
0
25 Apr 2024
LLMChain: Blockchain-based Reputation System for Sharing and Evaluating
  Large Language Models
LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models
Mouhamed Amine Bouchiha
Quentin Telnoff
Souhail Bakkali
R. Champagnat
Mourad Rabah
Mickael Coustaty
Y. Ghamri-Doudane
LRM
26
3
0
20 Apr 2024
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of
  Vietnamese Large Language Models
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models
Sang T. Truong
D. Q. Nguyen
Toan Nguyen
Dong D. Le
Nhi N. Truong
Tho Quan
Oluwasanmi Koyejo
24
2
0
05 Mar 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
29
0
0
28 Feb 2024
AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe
  Approach
AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach
Maryam Amirizaniani
Elias Martin
Tanya Roosta
Aman Chadha
Chirag Shah
18
2
0
14 Feb 2024
Selecting Shots for Demographic Fairness in Few-Shot Learning with Large
  Language Models
Selecting Shots for Demographic Fairness in Few-Shot Learning with Large Language Models
Carlos Alejandro Aguirre
Kuleen Sasse
Isabel Cachola
Mark Dredze
18
1
0
14 Nov 2023
Can LLM-Generated Misinformation Be Detected?
Can LLM-Generated Misinformation Be Detected?
Canyu Chen
Kai Shu
DeLMO
27
144
0
25 Sep 2023
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao
Guanyi Chen
Xulang Zhang
Frank Guerin
Erik Cambria
ELM
LM&MA
28
91
0
24 Aug 2023
TableGPT: Towards Unifying Tables, Nature Language and Commands into One
  GPT
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Liangyu Zha
Junlin Zhou
Liyao Li
Rui Wang
Qingyi Huang
...
Xing-yan Deng
J. Xu
Haobo Wang
Gang Chen
J. Zhao
RALM
LMTD
32
42
0
17 Jul 2023
Robust Prompt Optimization for Large Language Models Against
  Distribution Shifts
Robust Prompt Optimization for Large Language Models Against Distribution Shifts
Moxin Li
Wenjie Wang
Fuli Feng
Yixin Cao
Jizhi Zhang
Tat-Seng Chua
OffRL
32
14
0
23 May 2023
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
  Reasoning Strategies
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
245
460
0
06 Jan 2021
WARP: Word-level Adversarial ReProgramming
WARP: Word-level Adversarial ReProgramming
Karen Hambardzumyan
Hrant Khachatrian
Jonathan May
AAML
248
340
0
01 Jan 2021
e-SNLI: Natural Language Inference with Natural Language Explanations
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
249
618
0
04 Dec 2018
Generating Natural Language Adversarial Examples
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
233
909
0
21 Apr 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Adversarial Example Generation with Syntactically Controlled Paraphrase
  Networks
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
178
708
0
17 Apr 2018
1