ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.00059
  4. Cited By
The Generative AI Paradox: "What It Can Create, It May Not Understand"

The Generative AI Paradox: "What It Can Create, It May Not Understand"

31 October 2023
Peter West
Ximing Lu
Nouha Dziri
Faeze Brahman
Linjie Li
Jena D. Hwang
Liwei Jiang
Jillian R. Fisher
Abhilasha Ravichander
Khyathi Raghavi Chandu
Benjamin Newman
Pang Wei Koh
Allyson Ettinger
Yejin Choi
    AIMat
ArXivPDFHTML

Papers citing "The Generative AI Paradox: "What It Can Create, It May Not Understand""

50 / 59 papers shown
Title
Consistency in Language Models: Current Landscape, Challenges, and Future Directions
Consistency in Language Models: Current Landscape, Challenges, and Future Directions
Jekaterina Novikova
Carol Anderson
Borhane Blili-Hamelin
Subhabrata Majumdar
HILM
69
0
0
01 May 2025
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
Md Fahim Anjum
LRM
25
0
0
30 Apr 2025
Assesing LLMs in Art Contexts: Critique Generation and Theory of Mind Evaluation
Assesing LLMs in Art Contexts: Critique Generation and Theory of Mind Evaluation
Takaya Arita
Wenxian Zheng
Reiji Suzuki
Fuminori Akiba
22
0
0
17 Apr 2025
Aligned Probing: Relating Toxic Behavior and Model Internals
Aligned Probing: Relating Toxic Behavior and Model Internals
Andreas Waldis
Vagrant Gautam
Anne Lauscher
Dietrich Klakow
Iryna Gurevych
36
0
0
17 Mar 2025
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding
Imran Kabir
Md. Alimoor Reza
Syed Masum Billah
ReLM
VLM
LRM
78
0
0
16 Mar 2025
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
Davide Testa
Giovanni Bonetta
Raffaella Bernardi
Alessandro Bondielli
Alessandro Lenci
Alessio Miaschi
Lucia Passaro
Bernardo Magnini
VGen
LRM
50
0
0
24 Feb 2025
Evolution and The Knightian Blindspot of Machine Learning
Evolution and The Knightian Blindspot of Machine Learning
Joel Lehman
Elliot Meyerson
Tarek El-Gaaly
Kenneth O. Stanley
Tarin Ziyaee
78
1
0
22 Jan 2025
Text-to-SQL Calibration: No Need to Ask -- Just Rescale Model
  Probabilities
Text-to-SQL Calibration: No Need to Ask -- Just Rescale Model Probabilities
Ashwin Ramachandran
Sunita Sarawagi
61
2
0
23 Nov 2024
A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs
A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs
Ryosuke Sonoda
Ramya Srinivasan
56
1
0
22 Oct 2024
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
Zonghai Yao
Aditya Parashar
Huixue Zhou
Won Seok Jang
Feiyun Ouyang
Zhichao Yang
Hong-ye Yu
ELM
37
2
0
17 Oct 2024
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao
Wenxuan Ding
Shangbin Feng
Lucy Lu Wang
Yulia Tsvetkov
25
0
0
14 Oct 2024
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Juhyun Oh
Eunsu Kim
Jiseon Kim
Wenda Xu
Inha Cha
William Yang Wang
Alice H. Oh
21
0
0
09 Oct 2024
CS4: Measuring the Creativity of Large Language Models Automatically by
  Controlling the Number of Story-Writing Constraints
CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints
Anirudh Atmakuru
Jatin Nainani
Rohith Siddhartha Reddy Bheemreddy
Anirudh Lakkaraju
Zonghai Yao
Hamed Zamani
Haw-Shiuan Chang
60
2
0
05 Oct 2024
Generating bilingual example sentences with large language models as
  lexicography assistants
Generating bilingual example sentences with large language models as lexicography assistants
Raphael Merx
Ekaterina Vylomova
Kemal Kurniawan
23
2
0
04 Oct 2024
DocKD: Knowledge Distillation from LLMs for Open-World Document
  Understanding Models
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Sungnyun Kim
Haofu Liao
Srikar Appalaraju
Peng Tang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
Stefano Soatto
34
0
0
04 Oct 2024
Setting the AI Agenda -- Evidence from Sweden in the ChatGPT Era
Setting the AI Agenda -- Evidence from Sweden in the ChatGPT Era
Bastiaan Bruinsma
Annika Fredén
Kajsa Hansson
Moa Johansson
Pasko Kisić-Merino
Denitsa Saynova
31
0
0
25 Sep 2024
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
Andreas Stephan
D. Zhu
Matthias Aßenmacher
Xiaoyu Shen
Benjamin Roth
ELM
45
4
0
06 Sep 2024
Critic-CoT: Boosting the reasoning abilities of large language model via
  Chain-of-thoughts Critic
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Xin Zheng
Jie Lou
Boxi Cao
Xueru Wen
Yuqiu Ji
Hongyu Lin
Y. Lu
Xianpei Han
Debing Zhang
Le Sun
LLMAG
OffRL
LRM
ReLM
KELM
28
13
1
29 Aug 2024
Benchmarks as Microscopes: A Call for Model Metrology
Benchmarks as Microscopes: A Call for Model Metrology
Michael Stephen Saxon
Ari Holtzman
Peter West
William Yang Wang
Naomi Saphra
26
10
0
22 Jul 2024
Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial
  General Intelligence
Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence
Teo Susnjak
Timothy R. McIntosh
A. Barczak
N. Reyes
Tong Liu
Paul Watters
Malka N. Halgamuge
25
3
0
04 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and
  Aleatoric Awareness
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Raghavi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
36
2
0
02 Jul 2024
CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans
CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans
Yash Kumar Lal
Vanya Cohen
Nathanael Chambers
Niranjan Balasubramanian
Raymond Mooney
ELM
LRM
ReLM
27
3
0
22 Jun 2024
BeHonest: Benchmarking Honesty in Large Language Models
BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern
Zhulin Hu
Yuqing Yang
Ethan Chern
Yuan Guo
Jiahe Jin
Binjie Wang
Pengfei Liu
HILM
ALM
81
3
0
19 Jun 2024
Can Machines Resonate with Humans? Evaluating the Emotional and Empathic
  Comprehension of LMs
Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs
Muhammad Arslan Manzoor
Yuxia Wang
Minghan Wang
Preslav Nakov
24
0
0
17 Jun 2024
Understanding Understanding: A Pragmatic Framework Motivated by Large
  Language Models
Understanding Understanding: A Pragmatic Framework Motivated by Large Language Models
Kevin Leyton-Brown
Y. Shoham
ELM
14
0
0
16 Jun 2024
Cognitively Inspired Energy-Based World Models
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
31
0
0
13 Jun 2024
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation
  of Non-Literal Intent Resolution in LLMs
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs
Akhila Yerukola
Saujas Vaduguru
Daniel Fried
Maarten Sap
24
1
0
14 May 2024
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
Yuxia Wang
Minghan Wang
Hasan Iqbal
Georgi Georgiev
Jiahui Geng
Preslav Nakov
HILM
36
13
0
09 May 2024
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Yunxiang Zhang
Muhammad Khalifa
Lajanugen Logeswaran
Jaekyeom Kim
Moontae Lee
Honglak Lee
Lu Wang
LRM
KELM
ReLM
23
31
0
26 Apr 2024
CAUS: A Dataset for Question Generation based on Human Cognition
  Leveraging Large Language Models
CAUS: A Dataset for Question Generation based on Human Cognition Leveraging Large Language Models
Minjung Shin
Donghyun Kim
Jeh-Kwang Ryu
LRM
14
1
0
18 Apr 2024
Language Models Still Struggle to Zero-shot Reason about Time Series
Language Models Still Struggle to Zero-shot Reason about Time Series
Mike A. Merrill
Mingtian Tan
Vinayak Gupta
Tom Hartvigsen
Tim Althoff
AI4TS
LRM
30
26
0
17 Apr 2024
SELF-[IN]CORRECT: LLMs Struggle with Refining Self-Generated Responses
SELF-[IN]CORRECT: LLMs Struggle with Refining Self-Generated Responses
Dongwei Jiang
Jingyu Zhang
Orion Weller
Nathaniel Weir
Benjamin Van Durme
Daniel Khashabi
53
1
0
04 Apr 2024
Auxiliary task demands mask the capabilities of smaller language models
Auxiliary task demands mask the capabilities of smaller language models
Jennifer Hu
Michael C. Frank
ELM
26
25
0
03 Apr 2024
Can Language Models Recognize Convincing Arguments?
Can Language Models Recognize Convincing Arguments?
Paula Rescala
Manoel Horta Ribeiro
Tiancheng Hu
Robert West
LRM
24
15
0
31 Mar 2024
Few-shot Dialogue Strategy Learning for Motivational Interviewing via
  Inductive Reasoning
Few-shot Dialogue Strategy Learning for Motivational Interviewing via Inductive Reasoning
Zhouhang Xie
Bodhisattwa Prasad Majumder
Mengjie Zhao
Yoshinori Maeda
Keiichi Yamada
Hiromi Wakaki
Julian McAuley
32
3
0
23 Mar 2024
Reasoning Abilities of Large Language Models: In-Depth Analysis on the
  Abstraction and Reasoning Corpus
Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus
Seungpil Lee
Woochang Sim
Donghyeon Shin
Sanha Hwang
Wongyu Seo
Jiwon Park
Seokki Lee
Sejin Kim
Sundong Kim
LRM
37
19
0
18 Mar 2024
Discriminative Probing and Tuning for Text-to-Image Generation
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu
Wenjie Wang
Yongqi Li
Hanwang Zhang
Liqiang Nie
Tat-Seng Chua
31
7
0
07 Mar 2024
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Zicheng Lin
Zhibin Gou
Tian Liang
Ruilin Luo
Haowei Liu
Yujiu Yang
LRM
37
43
0
22 Feb 2024
A Critical Evaluation of AI Feedback for Aligning Large Language Models
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Archit Sharma
Sedrick Scott Keh
Eric Mitchell
Chelsea Finn
Kushal Arora
Thomas Kollar
ALM
LLMAG
18
23
0
19 Feb 2024
When is Tree Search Useful for LLM Planning? It Depends on the
  Discriminator
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
Ziru Chen
Michael White
Raymond Mooney
Ali Payani
Yu-Chuan Su
Huan Sun
LLMAG
75
33
0
16 Feb 2024
Quantifying the Persona Effect in LLM Simulations
Quantifying the Persona Effect in LLM Simulations
Tiancheng Hu
Nigel Collier
14
49
0
16 Feb 2024
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not
  Evaluate
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate
Juhyun Oh
Eunsu Kim
Inha Cha
Alice H. Oh
ELM
26
7
0
09 Feb 2024
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind
  Reasoning Capabilities of Large Language Models
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
Hainiu Xu
Runcong Zhao
Lixing Zhu
Jinhua Du
Yulan He
68
18
0
08 Feb 2024
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM
  Collaboration
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
Shangbin Feng
Weijia Shi
Yike Wang
Wenxuan Ding
Vidhisha Balachandran
Yulia Tsvetkov
18
77
0
01 Feb 2024
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional
  Correctness
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness
Manav Singhal
Tushar Aggarwal
Abhijeet Awasthi
Nagarajan Natarajan
Aditya Kanade
24
12
0
29 Jan 2024
Code Simulation Challenges for Large Language Models
Code Simulation Challenges for Large Language Models
Emanuele La Malfa
Christoph Weinhuber
Orazio Torre
Fangru Lin
Samuele Marro
Anthony Cohn
Nigel Shadbolt
Michael Wooldridge
LLMAG
LRM
17
8
0
17 Jan 2024
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs'
  Mathematical Reasoning Capabilities
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities
Yujun Mao
Yoon Kim
Yilun Zhou
LRM
ReLM
12
17
0
13 Jan 2024
Lost in the Source Language: How Large Language Models Evaluate the
  Quality of Machine Translation
Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation
Xu Huang
Zhirui Zhang
Xiang Geng
Yichao Du
Jiajun Chen
Shujian Huang
40
7
0
12 Jan 2024
Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks
Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks
Haz Sameen Shahgir
Xianghao Kong
Greg Ver Steeg
Yue Dong
8
5
0
22 Dec 2023
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Zineng Tang
Ziyi Yang
Mahmoud Khademi
Yang Liu
Chenguang Zhu
Mohit Bansal
LRM
MLLM
AuLLM
52
44
0
30 Nov 2023
12
Next