ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.03439
  4. Cited By
Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4

Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4

7 April 2023
Hanmeng Liu
Ruoxi Ning
Zhiyang Teng
Jian Liu
Qiji Zhou
Yuexin Zhang
    ELM
    ReLM
    LRM
ArXivPDFHTML

Papers citing "Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4"

50 / 161 papers shown
Title
Large Language Models are Temporal and Causal Reasoners for Video
  Question Answering
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Dohwan Ko
Ji Soo Lee
Wooyoung Kang
Byungseok Roh
Hyunwoo J. Kim
LRM
41
31
0
24 Oct 2023
Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting
  Elusive Disinformation
Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation
Jason Samuel Lucas
Adaku Uchendu
Michiharu Yamashita
Jooyoung Lee
Shaurya Rohatgi
Dongwon Lee
32
42
0
24 Oct 2023
Exploring the Boundaries of GPT-4 in Radiology
Exploring the Boundaries of GPT-4 in Radiology
Qianchu Liu
Stephanie L. Hyland
Shruthi Bannur
Kenza Bouzid
Daniel Coelho De Castro
...
Anja Thieme
A. Nori
M. Lungren
Ozan Oktay
Javier Alvarez-Valle
LM&MA
AI4CE
42
36
0
23 Oct 2023
LUNA: A Model-Based Universal Analysis Framework for Large Language
  Models
LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Da Song
Xuan Xie
Jiayang Song
Derui Zhu
Yuheng Huang
Felix Juefei Xu
Lei Ma
ALM
40
3
0
22 Oct 2023
Retrieval-Augmented Neural Response Generation Using Logical Reasoning
  and Relevance Scoring
Retrieval-Augmented Neural Response Generation Using Logical Reasoning and Relevance Scoring
Nicholas Walker
Stefan Ultes
Pierre Lison
RALM
LRM
38
2
0
20 Oct 2023
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
Xilie Xu
Keyi Kong
Ning Liu
Li-zhen Cui
Di Wang
Jingfeng Zhang
Mohan Kankanhalli
AAML
SILM
36
69
0
20 Oct 2023
On the Effectiveness of Creating Conversational Agent Personalities
  Through Prompting
On the Effectiveness of Creating Conversational Agent Personalities Through Prompting
Heng Gu
Chadha Degachi
Uğur Genç
Senthil K. Chandrasegaran
Himanshu Verma
AI4CE
21
7
0
17 Oct 2023
BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology
BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology
Odhran O'Donoghue
Aleksandar Shtedritski
John Ginger
Ralph Abboud
Ali E. Ghareeb
Justin Booth
Samuel G. Rodriques
22
18
0
16 Oct 2023
Autonomous Tree-search Ability of Large Language Models
Autonomous Tree-search Ability of Large Language Models
Zheyu Zhang
Zhuorui Ye
Yikang Shen
Chuang Gan
LRM
32
0
0
14 Oct 2023
Learning To Teach Large Language Models Logical Reasoning
Learning To Teach Large Language Models Logical Reasoning
Meiqi Chen
Yubo Ma
Kaitao Song
Yixin Cao
Yan Zhang
Dongsheng Li
ELM
LRM
28
14
0
13 Oct 2023
Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained
  Large Language Models with Template-Content Structure
Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure
Haotong Yang
Fanxu Meng
Zhouchen Lin
Muhan Zhang
LRM
31
2
0
09 Oct 2023
Chain of Natural Language Inference for Reducing Large Language Model
  Ungrounded Hallucinations
Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations
Deren Lei
Yaxi Li
Mengya Hu
Mingyu Wang
Vincent Yun
Emily Ching
Eslam Kamal
HILM
LRM
29
39
0
06 Oct 2023
Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using
  PsychoBench
Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench
Jen-tse Huang
Wenxuan Wang
E. Li
Man Ho Lam
Shujie Ren
Youliang Yuan
Wenxiang Jiao
Zhaopeng Tu
Michael R. Lyu
LM&MA
AI4MH
ALM
45
25
0
02 Oct 2023
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical
  Reasoning Capabilities of Language Models
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models
Man Luo
Shrinidhi Kumbhar
Ming shen
Mihir Parmar
Neeraj Varshney
Pratyay Banerjee
Somak Aditya
Chitta Baral
ReLM
ELM
LRM
47
27
0
02 Oct 2023
From Language Modeling to Instruction Following: Understanding the
  Behavior Shift in LLMs after Instruction Tuning
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Xuansheng Wu
Wenlin Yao
Jianshu Chen
Xiaoman Pan
Xiaoyang Wang
Ninghao Liu
Dong Yu
LRM
22
28
0
30 Sep 2023
Can LLM-Generated Misinformation Be Detected?
Can LLM-Generated Misinformation Be Detected?
Canyu Chen
Kai Shu
DeLMO
41
159
0
25 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
45
66
0
21 Sep 2023
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation
  Suite for Large Language Models
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
Wei Qi Leong
Jian Gang Ngui
Yosephine Susanto
Hamsawardhini Rengarajan
Kengatharaiyer Sarveswaran
William-Chandra Tjhi
29
9
0
12 Sep 2023
Strategic Behavior of Large Language Models: Game Structure vs.
  Contextual Framing
Strategic Behavior of Large Language Models: Game Structure vs. Contextual Framing
Nunzio Lorè
Babak Heydari
29
33
0
12 Sep 2023
An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language
  Model Game Agents
An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents
Maximilian Croissant
Madeleine Frister
Guy Schofield
Cade McCall
LLMAG
34
14
0
10 Sep 2023
Can Large Language Models Discern Evidence for Scientific Hypotheses?
  Case Studies in the Social Sciences
Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences
S. Koneru
Jian Wu
Sarah Rajtmajer
29
9
0
07 Sep 2023
Everyone Deserves A Reward: Learning Customized Human Preferences
Everyone Deserves A Reward: Learning Customized Human Preferences
Pengyu Cheng
Jiawen Xie
Ke Bai
Yong Dai
Nan Du
19
30
0
06 Sep 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
53
10
0
28 Aug 2023
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao
Guanyi Chen
Xulang Zhang
Frank Guerin
Min Zhang
ELM
LM&MA
38
101
0
24 Aug 2023
Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis
Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis
Akshat Gupta
LLMAG
AI4MH
26
10
0
23 Aug 2023
LLMRec: Benchmarking Large Language Models on Recommendation Task
LLMRec: Benchmarking Large Language Models on Recommendation Task
Junling Liu
Chao-Hong Liu
Peilin Zhou
Qichen Ye
Dading Chong
...
Yueqi Xie
Dongyuan Li
Shoujin Wang
Chenyu You
Philip S.Yu
ALM
LRM
33
32
0
23 Aug 2023
Diversity Measures: Domain-Independent Proxies for Failure in Language
  Model Queries
Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries
Noel Ngu
Nathaniel Lee
Paulo Shakarian
35
4
0
22 Aug 2023
Conversational Ontology Alignment with ChatGPT
Conversational Ontology Alignment with ChatGPT
Sanaz Saki Norouzi
Mohammad Saeid Mahdavinejad
Pascal Hitzler
30
15
0
18 Aug 2023
Evaluating the Instruction-Following Robustness of Large Language Models
  to Prompt Injection
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection
Zekun Li
Baolin Peng
Pengcheng He
Xifeng Yan
ELM
SILM
AAML
41
24
0
17 Aug 2023
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain
  Conversation
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Junru Lu
Siyu An
Mingbao Lin
Gabriele Pergola
Yulan He
Di Yin
Xing Sun
Yunsheng Wu
49
32
0
16 Aug 2023
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers
A. Luccioni
53
19
0
14 Aug 2023
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
Jiaju Lin
Haoran Zhao
Aochi Zhang
Yiting Wu
Huqiuyue Ping
Qin Chen
ELM
LLMAG
35
59
0
08 Aug 2023
A criterion for Artificial General Intelligence: hypothetic-deductive
  reasoning, tested on ChatGPT
A criterion for Artificial General Intelligence: hypothetic-deductive reasoning, tested on ChatGPT
L. Vervoort
Vitaliy Mizyakov
Anastasia V. Ugleva
ReLM
ELM
LRM
24
1
0
05 Aug 2023
GPT-4 Can't Reason
GPT-4 Can't Reason
Konstantine Arkoudas
ELM
LRM
AI4MH
25
33
0
21 Jul 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities
  of Large Language Models
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Yizhou Sun
Wei Wang
ELM
LRM
30
88
0
20 Jul 2023
Information Retrieval Meets Large Language Models: A Strategic Report
  from Chinese IR Community
Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community
Qingyao Ai
Ting Bai
Bo Zhao
Yi-Ju Chang
Jiawei Chen
...
Peng Zhang
Fan Zhang
Wei-na Zhang
Hao Fei
Xiaofei Zhu
52
59
0
19 Jul 2023
How is ChatGPT's behavior changing over time?
How is ChatGPT's behavior changing over time?
Lingjiao Chen
Matei A. Zaharia
James Zou
ELM
KELM
AI4MH
44
415
0
18 Jul 2023
Can I say, now machines can think?
Can I say, now machines can think?
Nitisha Aggarwal
G. Saxena
Sanjeev Singh
Amit Pundir
LRM
AI4CE
21
3
0
11 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
75
1,538
0
06 Jul 2023
What Should Data Science Education Do with Large Language Models?
What Should Data Science Education Do with Large Language Models?
Xinming Tu
James Zou
Weijie J. Su
Linjun Zhang
AI4Ed
47
32
0
06 Jul 2023
From Query Tools to Causal Architects: Harnessing Large Language Models
  for Advanced Causal Discovery from Data
From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data
Taiyu Ban
Lyvzhou Chen
Xiangyu Wang
Huanhuan Chen
ELM
31
58
0
29 Jun 2023
A negation detection assessment of GPTs: analysis with the xNot360
  dataset
A negation detection assessment of GPTs: analysis with the xNot360 dataset
Nguyen Ha Thanh
Randy Goebel
Francesca Toni
Kostas Stathis
Ken Satoh
30
9
0
29 Jun 2023
Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs
  for Fact-aware Language Modeling
Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling
Lin F. Yang
Hongyang Chen
Zhao Li
Xiao Ding
Xindong Wu
KELM
40
87
0
20 Jun 2023
Are Large Language Models Really Good Logical Reasoners? A Comprehensive
  Evaluation and Beyond
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond
Fangzhi Xu
Qika Lin
Jiawei Han
Tianzhe Zhao
Jun Liu
Min Zhang
ELM
LRM
44
33
0
16 Jun 2023
PromptRobust: Towards Evaluating the Robustness of Large Language Models
  on Adversarial Prompts
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Kaijie Zhu
Jindong Wang
Jiaheng Zhou
Zichen Wang
Hao Chen
...
Linyi Yang
Weirong Ye
Yue Zhang
Neil Zhenqiang Gong
Xingxu Xie
SILM
50
144
0
07 Jun 2023
Enhancing In-Context Learning with Answer Feedback for Multi-Span
  Question Answering
Enhancing In-Context Learning with Answer Feedback for Multi-Span Question Answering
Zixian Huang
Jiaying Zhou
Gengyang Xiao
Gong Cheng
KELM
11
10
0
07 Jun 2023
GPT4GEO: How a Language Model Sees the World's Geography
GPT4GEO: How a Language Model Sees the World's Geography
Jonathan Roberts
Timo Lüddecke
Sowmen Das
Kai Han
Samuel Albanie
29
60
0
30 May 2023
Self-contradictory Hallucinations of Large Language Models: Evaluation,
  Detection and Mitigation
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation
Niels Mündler
Jingxuan He
Slobodan Jenko
Martin Vechev
HILM
22
108
0
25 May 2023
Simple Linguistic Inferences of Large Language Models (LLMs): Blind
  Spots and Blinds
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
Victoria Basmov
Yoav Goldberg
Reut Tsarfaty
ReLM
LRM
32
5
0
24 May 2023
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities
  and Future Opportunities
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities
Yuqi Zhu
Xiaohan Wang
Jing Chen
Shuofei Qiao
Yixin Ou
Yunzhi Yao
Shumin Deng
Huajun Chen
Ningyu Zhang
LLMAG
43
111
0
22 May 2023
Previous
1234
Next