Consistency Analysis of ChatGPT

11 March 2023

Papers citing "Consistency Analysis of ChatGPT"

28 / 28 papers shown

Title
Consistency in Language Models: Current Landscape, Challenges, and Future Directions Jekaterina Novikova Carol Anderson Borhane Blili-Hamelin Subhabrata Majumdar HILM 69 0 0 01 May 2025
Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing Jihyun Janice Ahn Wenpeng Yin SILM LRM 58 1 0 02 Apr 2025
Large Language Models Often Say One Thing and Do Another Ruoxi Xu Hongyu Lin Xianpei Han Jia Zheng Weixiang Zhou Le Sun Yingfei Sun 45 1 0 10 Mar 2025
Automated Consistency Analysis of LLMs Aditya Patwardhan Vivek Vaidya Ashish Kundu 53 0 0 10 Feb 2025
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks Yingzhe Peng Xiaoting Qin Zhiyang Zhang Jue Zhang Qingwei Lin Xu Yang Dongmei Zhang Saravan Rajmohan Qi Zhang 36 3 0 31 Oct 2024
Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering Yanggyu Lee Jihie Kim 38 1 0 20 Oct 2024
A Prompt Engineering Approach and a Knowledge Graph based Framework for Tackling Legal Implications of Large Language Model Answers George Hannah Rita T. Sousa Ioannis Dasoulas Claudia dÁmato AILaw ELM 39 0 0 19 Oct 2024
MM-R $^3$ : On (In-)Consistency of Multi-modal Large Language Models (MLLMs) Shih-Han Chou Shivam Chandhok James J. Little Leonid Sigal 35 0 0 07 Oct 2024
Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models Yinhong Liu Zhijiang Guo Tianya Liang Ehsan Shareghi Ivan Vulić Nigel Collier 91 0 0 03 Oct 2024
Logically Consistent Language Models via Neuro-Symbolic Integration Diego Calanzone Stefano Teso Antonio Vergari LRM 71 6 0 09 Sep 2024
A Survey on the Real Power of ChatGPT Ming Liu Ran Liu Ye Zhu Hua Wang Youyang Qu Rongsheng Li Yongpan Sheng Wray L. Buntine 42 2 0 22 Apr 2024
Scope Ambiguities in Large Language Models Gaurav Kamath Sebastian Schuster Sowmya Vajjala Siva Reddy 27 2 0 05 Apr 2024
Reasoning Runtime Behavior of a Program with LLM: How Far Are We? Junkai Chen Zhiyuan Pan Xing Hu Zhenhao Li Ge Li Xin Xia LRM 32 20 0 25 Mar 2024
Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries Yiqiao Jin Mohit Chandra Gaurav Verma Yibo Hu Munmun De Choudhury Srijan Kumar LM&MA ELM 87 66 0 19 Oct 2023
Framework-Based Qualitative Analysis of Free Responses of Large Language Models: Algorithmic Fidelity A. Amirova T. Fteropoulli Nafiso Ahmed Martin R. Cowie Joel Z. Leibo 18 5 0 06 Sep 2023
Do You Trust ChatGPT? -- Perceived Credibility of Human and AI-Generated Content Martin Huschens Martin Briesch Dominik Sobania Franz Rothlauf 13 12 0 05 Sep 2023
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models Yuheng Huang Jiayang Song Zhijie Wang Shengming Zhao Huaming Chen Felix Juefei-Xu Lei Ma 28 3 0 16 Jul 2023
Evaluating Superhuman Models with Consistency Checks Lukas Fluri Daniel Paleka Florian Tramèr ELM 31 41 0 16 Jun 2023
Language models are not naysayers: An analysis of language models on negation benchmarks Thinh Hung Truong Timothy Baldwin Karin Verspoor Trevor Cohn 22 54 0 14 Jun 2023
Utilizing ChatGPT to Enhance Clinical Trial Enrollment Georgios Peikos S. Symeonidis Pranav Kasela G. Pasi LM&MA 14 12 0 03 Jun 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets Md Tahmid Rahman Laskar M Saiful Bari Mizanur Rahman Md Amran Hossen Bhuiyan Shafiq R. Joty J. Huang LM&MA ELM ALM 41 178 0 29 May 2023
Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses Xenia Ohmer Elia Bruni Dieuwke Hupkes LRM 15 13 0 19 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation Xiaowei Huang Wenjie Ruan Wei Huang Gao Jin Yizhen Dong ... Sihao Wu Peipei Xu Dengyu Wu André Freitas Mustafa A. Mustafa ALM 27 81 0 19 May 2023
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT Xinyue Shen Z. Chen Michael Backes Yang Zhang 19 55 0 18 Apr 2023
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning Viet Dac Lai Nghia Trung Ngo Amir Pouran Ben Veyseh Hieu Man Franck Dernoncourt Trung Bui Thien Huu Nguyen ELM LM&MA 25 267 0 12 Apr 2023
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations Jaehun Jung Lianhui Qin Sean Welleck Faeze Brahman Chandra Bhagavatula Ronan Le Bras Yejin Choi ReLM LRM 218 189 0 24 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,881 0 04 Mar 2022
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 280 3,843 0 18 Apr 2021