ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.05320
  4. Cited By
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

14 January 2022
Alon Talmor
Ori Yoran
Ronan Le Bras
Chandrasekhar Bhagavatula
Yoav Goldberg
Yejin Choi
Jonathan Berant
    ELM
ArXivPDFHTML

Papers citing "CommonsenseQA 2.0: Exposing the Limits of AI through Gamification"

50 / 105 papers shown
Title
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
57
0
0
02 May 2025
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Dylan Bouchard
Mohit Singh Chauhan
HILM
70
0
0
27 Apr 2025
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
Abhinav Joshi
A. Ahmad
Divyaksh Shukla
Ashutosh Modi
ReLM
LRM
34
0
0
14 Apr 2025
CoLa -- Learning to Interactively Collaborate with Large LMs
CoLa -- Learning to Interactively Collaborate with Large LMs
Abhishek Sharma
Dan Goldwasser
LLMAG
SyDa
58
0
0
03 Apr 2025
KnowLogic: A Benchmark for Commonsense Reasoning via Knowledge-Driven Data Synthesis
W. Zhan
Y. Wang
Nan Hu
Liming Xiao
Jingyuan Ma
...
Wenhan Ma
Rui Li
Weilin Luo
Qun Liu
Zhifang Sui
LRM
54
1
0
08 Mar 2025
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Subhash Kantamneni
Joshua Engels
Senthooran Rajamanoharan
Max Tegmark
Neel Nanda
59
4
0
23 Feb 2025
Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset
Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset
Rawand Alfugaha
Mohammad AL-Smadi
LRM
ELM
34
0
0
19 Feb 2025
Economics of Sourcing Human Data
Economics of Sourcing Human Data
Sebastin Santy
Prasanta Bhattacharya
Manoel Horta Ribeiro
Kelsey Allen
Sewoong Oh
69
0
0
11 Feb 2025
PSSD: Making Large Language Models Self-denial via Human Psyche Structure
PSSD: Making Large Language Models Self-denial via Human Psyche Structure
Jinzhi Liao
Zenghua Liao
Xiang Zhao
LRM
LLMAG
48
0
0
03 Feb 2025
Chained Tuning Leads to Biased Forgetting
Chained Tuning Leads to Biased Forgetting
Megan Ung
Alicia Sun
Samuel J. Bell
Bhaktipriya Radharapu
Levent Sagun
Adina Williams
CLL
KELM
84
0
0
21 Dec 2024
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics
  Manipulation
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation
Harsh Singh
Rocktim Jyoti Das
Mingfei Han
Preslav Nakov
Ivan Laptev
LM&Ro
LLMAG
67
2
0
26 Nov 2024
Clustering Algorithms and RAG Enhancing Semi-Supervised Text
  Classification with Large LLMs
Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs
Shan Zhong
Jiahao Zeng
Yongxin Yu
Bohong Lin
34
1
0
09 Nov 2024
What Really is Commonsense Knowledge?
What Really is Commonsense Knowledge?
Quyet V. Do
Junze Li
Tung-Duong Vuong
Zhaowei Wang
Y. Song
Xiaojuan Ma
23
0
0
06 Nov 2024
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large
  Language Models
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
Junda Wu
Xintong Li
Ruoyu Wang
Yu Xia
Yuxin Xiong
...
Xiang Chen
B. Kveton
Lina Yao
Jingbo Shang
Julian McAuley
OffRL
LRM
29
0
0
31 Oct 2024
Belief in the Machine: Investigating Epistemological Blind Spots of
  Language Models
Belief in the Machine: Investigating Epistemological Blind Spots of Language Models
Mirac Suzgun
Tayfun Gur
Federico Bianchi
Daniel E. Ho
Thomas F. Icard
Dan Jurafsky
James Zou
29
1
0
28 Oct 2024
LLMScan: Causal Scan for LLM Misbehavior Detection
LLMScan: Causal Scan for LLM Misbehavior Detection
Mengdi Zhang
Kai Kiat Goh
Peixin Zhang
Jun Sun
18
0
0
22 Oct 2024
Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models'
  Reasoning with Formal Logic
Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic
Jason Chan
Robert Gaizauskas
Zhixue Zhao
ELM
AAML
LRM
25
0
0
21 Oct 2024
Leaving the barn door open for Clever Hans: Simple features predict LLM
  benchmark answers
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers
Lorenzo Pacchiardi
Marko Tesic
Lucy G. Cheke
José Hernández Orallo
31
3
0
15 Oct 2024
NoVo: Norm Voting off Hallucinations with Attention Heads in Large
  Language Models
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models
Zheng Yi Ho
Siyuan Liang
Sen Zhang
Yibing Zhan
Dacheng Tao
26
2
0
11 Oct 2024
Narrative-of-Thought: Improving Temporal Reasoning of Large Language
  Models via Recounted Narratives
Narrative-of-Thought: Improving Temporal Reasoning of Large Language Models via Recounted Narratives
Xinliang Frederick Zhang
Nick Beauchamp
Lu Wang
LRM
AI4CE
27
3
0
07 Oct 2024
ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense
  Question Answering
ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering
Francesco Maria Molfese
Simone Conia
Riccardo Orlando
Roberto Navigli
ReLM
LRM
RALM
25
1
0
07 Oct 2024
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Shashank Yadav
Rohan Tomar
Garvit Jain
Chirag Ahooja
Shubham Chaudhary
Charles Elkan
28
0
0
05 Oct 2024
Unveiling Narrative Reasoning Limits of Large Language Models with Trope
  in Movie Synopses
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Hung-Ting Su
Ya-Ching Hsu
Xudong Lin
Xiang Qian Shi
Yulei Niu
Han-Yuan Hsu
Hung-yi Lee
Winston H. Hsu
LRM
31
0
0
22 Sep 2024
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through
  Corpus Retrieval and Augmentation
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation
Ingo Ziegler
Abdullatif Köksal
Desmond Elliott
Hinrich Schütze
38
5
0
03 Sep 2024
Visual Riddles: a Commonsense and World Knowledge Challenge for Large
  Vision and Language Models
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models
Nitzan Bitton-Guetta
Aviv Slobodkin
Aviya Maimon
Eliya Habba
Royi Rassin
Yonatan Bitton
Idan Szpektor
Amir Globerson
Yuval Elovici
ReLM
VLM
LRM
34
5
0
28 Jul 2024
Enhancing Language Model Rationality with Bi-Directional Deliberation
  Reasoning
Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning
Yadong Zhang
Shaoguang Mao
Wenshan Wu
Yan Xia
Tao Ge
Man Lan
Furu Wei
48
2
0
08 Jul 2024
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation
  Strategy by Language Models and Humans
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
38
2
0
06 Jun 2024
Scaling and evaluating sparse autoencoders
Scaling and evaluating sparse autoencoders
Leo Gao
Tom Dupré la Tour
Henk Tillman
Gabriel Goh
Rajan Troll
Alec Radford
Ilya Sutskever
Jan Leike
Jeffrey Wu
28
112
0
06 Jun 2024
ACCORD: Closing the Commonsense Measurability Gap
ACCORD: Closing the Commonsense Measurability Gap
François Roewer-Després
Jinyue Feng
Zining Zhu
Frank Rudzicz
LRM
34
0
0
04 Jun 2024
Creative Problem Solving in Large Language and Vision Models -- What
  Would it Take?
Creative Problem Solving in Large Language and Vision Models -- What Would it Take?
Lakshmi Nair
Evana Gizzi
Jivko Sinapov
MLLM
48
2
0
02 May 2024
General Purpose Verification for Chain of Thought Prompting
General Purpose Verification for Chain of Thought Prompting
Robert Vacareanu
Anurag Pratik
Evangelia Spiliopoulou
Zheng Qi
Giovanni Paolini
Neha Ann John
Jie Ma
Yassine Benajiba
Miguel Ballesteros
LRM
19
7
0
30 Apr 2024
Vision-Language Model-based Physical Reasoning for Robot Liquid
  Perception
Vision-Language Model-based Physical Reasoning for Robot Liquid Perception
Wenqiang Lai
Yuan Gao
T. Lam
LRM
LM&Ro
23
5
0
10 Apr 2024
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging
  LLMs' (Lack of) Multicultural Knowledge
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Yu Ying Chiu
Amirhossein Ajalloeian
Maria Antoniak
Chan Young Park
Shuyue Stella Li
Mehar Bhatia
Sahithya Ravi
Yulia Tsvetkov
Vered Shwartz
Yejin Choi
36
20
0
10 Apr 2024
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language
  Models
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
Yadong Zhang
Shaoguang Mao
Tao Ge
Xun Wang
Adrian de Wynter
Yan Xia
Wenshan Wu
Ting Song
Man Lan
Furu Wei
LRM
78
48
0
01 Apr 2024
Rule or Story, Which is a Better Commonsense Expression for Talking with
  Large Language Models?
Rule or Story, Which is a Better Commonsense Expression for Talking with Large Language Models?
Ning Bian
Xianpei Han
Hongyu Lin
Yaojie Lu
Ben He
Le Sun
26
1
0
22 Feb 2024
GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object
  Affordances of Language and Vision Models
GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object Affordances of Language and Vision Models
Sayantan Adak
Daivik Agrawal
Animesh Mukherjee
Somak Aditya
24
3
0
20 Feb 2024
Learning to Learn Faster from Human Feedback with Language Model
  Predictive Control
Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Jacky Liang
Fei Xia
Wenhao Yu
Andy Zeng
Montse Gonzalez Arenas
...
N. Heess
Kanishka Rao
Nik Stewart
Jie Tan
Carolina Parada
LM&Ro
52
32
0
18 Feb 2024
Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and
  Improving LLMs
Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs
Siyuan Wang
Zhongyu Wei
Yejin Choi
Xiang Ren
ReLM
ELM
LRM
11
19
0
18 Feb 2024
PipeNet: Question Answering with Semantic Pruning over Knowledge Graphs
PipeNet: Question Answering with Semantic Pruning over Knowledge Graphs
Ying Su
Jipeng Zhang
Yangqiu Song
Tong Zhang
25
0
0
31 Jan 2024
PhotoBot: Reference-Guided Interactive Photography via Natural Language
PhotoBot: Reference-Guided Interactive Photography via Natural Language
Oliver Limoyo
J. Li
D. Rivkin
Jonathan Kelly
Gregory Dudek
LM&Ro
11
0
0
19 Jan 2024
PathFinder: Guided Search over Multi-Step Reasoning Paths
PathFinder: Guided Search over Multi-Step Reasoning Paths
O. Yu. Golovneva
Sean O'Brien
Ramakanth Pasunuru
Tianlu Wang
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
LRM
14
7
0
08 Dec 2023
IAG: Induction-Augmented Generation Framework for Answering Reasoning
  Questions
IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions
Zhebin Zhang
Xinyu Zhang
Yuanhang Ren
Saijiang Shi
Meng Han
Yongkang Wu
Ruofei Lai
Zhao Cao
RALM
LRM
11
15
0
30 Nov 2023
CLOMO: Counterfactual Logical Modification with Large Language Models
CLOMO: Counterfactual Logical Modification with Large Language Models
Yinya Huang
Ruixin Hong
Hongming Zhang
Wei Shao
Zhicheng YANG
Dong Yu
Changshui Zhang
Xiaodan Liang
Linqi Song
LRM
23
7
0
29 Nov 2023
Trends in Integration of Knowledge and Large Language Models: A Survey
  and Taxonomy of Methods, Benchmarks, and Applications
Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications
Zhangyin Feng
Weitao Ma
Weijiang Yu
Lei Huang
Haotian Wang
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
KELM
21
37
0
10 Nov 2023
QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for
  Zero-Shot Commonsense Question Answering
QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for Zero-Shot Commonsense Question Answering
Haochen Shi
Weiqi Wang
Tianqing Fang
Baixuan Xu
Wenxuan Ding
Xin Liu
Yangqiu Song
55
7
0
17 Oct 2023
Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with
  Large Language Models
Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
Anni Zou
Zhuosheng Zhang
Hai Zhao
Xiangru Tang
LRM
ReLM
34
1
0
10 Oct 2023
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought
  Reasoning: Advances, Frontiers and Future
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming-Yu Liu
Bing Qin
Ting Liu
LRM
AI4CE
21
149
0
27 Sep 2023
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking
  Unrelated Questions
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
Lorenzo Pacchiardi
A. J. Chan
Sören Mindermann
Ilan Moscovitz
Alexa Y. Pan
Y. Gal
Owain Evans
J. Brauner
LLMAG
HILM
17
48
0
26 Sep 2023
Large Language Models Are Also Good Prototypical Commonsense Reasoners
Large Language Models Are Also Good Prototypical Commonsense Reasoners
Chenin Li
Qianglong Chen
Yin Zhang
Yifei Zhang
Hongxiang Yao
ReLM
LRM
ELM
14
0
0
22 Sep 2023
Gesture-Informed Robot Assistance via Foundation Models
Gesture-Informed Robot Assistance via Foundation Models
Li-Heng Lin
Yuchen Cui
Yilun Hao
Fei Xia
Dorsa Sadigh
LM&Ro
SLR
13
19
0
06 Sep 2023
123
Next