ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.07934
  4. Cited By
BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory
  Information

BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

13 June 2023
Mehran Kazemi
Quan Yuan
Deepti Bhatia
Najoung Kim
Xin Xu
Vaiva Imbrasaite
Deepak Ramachandran
    LRM
ArXivPDFHTML

Papers citing "BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information"

43 / 43 papers shown
Title
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
68
0
0
01 Apr 2025
Gemma 3 Technical Report
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
85
30
0
25 Mar 2025
MastermindEval: A Simple But Scalable Reasoning Benchmark
Jonas Golde
Patrick Haller
Fabio Barth
Alan Akbik
LRM
ReLM
ELM
51
2
0
07 Mar 2025
BIG-Bench Extra Hard
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
117
4
0
26 Feb 2025
TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning
TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning
Frederikus Hudi
Genta Indra Winata
Ruochen Zhang
Alham Fikri Aji
ReLM
LRM
80
2
0
25 Feb 2025
Logic Haystacks: Probing LLMs Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding)
Logic Haystacks: Probing LLMs Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding)
Damien Sileo
RALM
LRM
36
0
0
24 Feb 2025
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen
Guangtao Zeng
Zhenting Qi
Zhang-Wei Hong
Zhenfang Chen
Wei Lu
G. Wornell
Subhro Das
David D. Cox
Chuang Gan
LLMAG
LRM
127
5
0
04 Feb 2025
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
Yue Zhang
Liqiang Jing
Vibhav Gogate
116
2
0
19 Dec 2024
On Memorization of Large Language Models in Logical Reasoning
On Memorization of Large Language Models in Logical Reasoning
Chulin Xie
Yangsibo Huang
Chiyuan Zhang
Da Yu
Xinyun Chen
Bill Yuchen Lin
Bo Li
Badih Ghazi
Ravi Kumar
LRM
45
20
0
30 Oct 2024
Open Domain Question Answering with Conflicting Contexts
Open Domain Question Answering with Conflicting Contexts
Siyi Liu
Qiang Ning
Kishaloy Halder
Wei Xiao
Zheng Qi
...
Yi Zhang
Neha Anna John
Bonan Min
Yassine Benajiba
Dan Roth
LLMAG
63
2
0
16 Oct 2024
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Zirui Zhao
Hanze Dong
Amrita Saha
Caiming Xiong
Doyen Sahoo
LRM
27
3
0
10 Oct 2024
Quantifying Generalization Complexity for Large Language Models
Quantifying Generalization Complexity for Large Language Models
Zhenting Qi
Hongyin Luo
Xuliang Huang
Zhuokai Zhao
Yibo Jiang
Xiangjun Fan
Himabindu Lakkaraju
James Glass
LRM
ELM
26
5
0
02 Oct 2024
Enhancing Logical Reasoning in Large Language Models through Graph-based
  Synthetic Data
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data
Jiaming Zhou
Abbas Ghaddar
Ge Zhang
Liheng Ma
Yaochen Hu
Soumyasundar Pal
Mark J. Coates
Bin Wang
Yingxue Zhang
Jianye Hao
ReLM
LRM
35
4
0
19 Sep 2024
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
Jin Jiang
Yuchen Yan
Yang Liu
Yonggang Jin
Shuai Peng
M. Zhang
Xunliang Cai
Yixin Cao
Liangcai Gao
Zhi Tang
LRM
40
3
0
19 Sep 2024
Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter?
Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter?
Nemika Tyagi
Mihir Parmar
Mohith Kulkarni
Aswin Rrv
Nisarg Patel
Mutsumi Nakamura
Arindam Mitra
Chitta Baral
LRM
35
6
0
20 Jul 2024
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through
  Self-Correction in Language Models
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models
Haritz Puerto
Tilek Chubakov
Xiaodan Zhu
Harish Tayyar Madabushi
Iryna Gurevych
ReLM
LRM
39
9
1
03 Jul 2024
Belief Revision: The Adaptability of Large Language Models Reasoning
Belief Revision: The Adaptability of Large Language Models Reasoning
Bryan Wilie
Samuel Cahyawijaya
Etsuko Ishii
Junxian He
Pascale Fung
KELM
LRM
34
1
0
28 Jun 2024
Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability
  of Large Language Models
Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
Nisarg Patel
Mohith Kulkarni
Mihir Parmar
Aashna Budhiraja
Mutsumi Nakamura
Neeraj Varshney
Chitta Baral
ELM
LRM
33
6
0
24 Jun 2024
Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive
  Declarative Grammars
Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars
Damien Sileo
LRM
ReLM
23
2
0
16 Jun 2024
ReMI: A Dataset for Reasoning with Multiple Images
ReMI: A Dataset for Reasoning with Multiple Images
Mehran Kazemi
Nishanth Dikkala
Ankit Anand
Petar Dević
Ishita Dasgupta
...
Bahare Fatemi
Pranjal Awasthi
Dee Guo
Sreenivas Gollapudi
Ahmed Qureshi
LRM
VLM
34
13
0
13 Jun 2024
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Bahare Fatemi
Mehran Kazemi
Anton Tsitsulin
Karishma Malkan
Jinyeong Yim
John Palowitch
Sungyong Seo
Jonathan J. Halcrow
Bryan Perozzi
LRM
35
26
0
13 Jun 2024
Are LLMs classical or nonmonotonic reasoners? Lessons from generics
Are LLMs classical or nonmonotonic reasoners? Lessons from generics
Alina Leidinger
R. Rooij
Ekaterina Shutova
LRM
26
3
0
05 Jun 2024
A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters
A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters
Long Hei Matthew Lam
Ramya Keerthy Thatikonda
Ehsan Shareghi
ELM
LRM
40
1
0
01 Jun 2024
Puzzle Solving using Reasoning of Large Language Models: A Survey
Puzzle Solving using Reasoning of Large Language Models: A Survey
Panagiotis Giadikiaroglou
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
ELM
ReLM
LRM
11
24
0
17 Feb 2024
Training Large Language Models for Reasoning through Reverse Curriculum
  Reinforcement Learning
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Zhiheng Xi
Wenxiang Chen
Boyang Hong
Senjie Jin
Rui Zheng
...
Xinbo Zhang
Peng Sun
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
32
20
0
08 Feb 2024
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Haritz Puerto
Martin Tutek
Somak Aditya
Xiaodan Zhu
Iryna Gurevych
ReCod
ReLM
LRM
43
9
0
18 Jan 2024
GeomVerse: A Systematic Evaluation of Large Models for Geometric
  Reasoning
GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning
Mehran Kazemi
Hamidreza Alvari
Ankit Anand
Jialin Wu
Xi Chen
Radu Soricut
LRM
ReLM
20
53
0
19 Dec 2023
A Survey of Reasoning with Foundation Models
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
E. Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLM
LRM
AI4CE
22
76
0
17 Dec 2023
TaskLAMA: Probing the Complex Task Understanding of Language Models
TaskLAMA: Probing the Complex Task Understanding of Language Models
Quan Yuan
Mehran Kazemi
Xinyuan Xu
Isaac Noble
Vaiva Imbrasaite
Deepak Ramachandran
LRM
20
10
0
29 Aug 2023
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting
Xi Ye
Qiaochu Chen
Işıl Dillig
Greg Durrett
ReLM
ReCod
LRM
33
62
0
16 May 2023
Entity Tracking in Language Models
Entity Tracking in Language Models
Najoung Kim
Sebastian Schuster
50
16
0
03 May 2023
Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models
  against Counterfactual Noise
Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise
Giwon Hong
Jeonghwan Kim
Junmo Kang
Sung-Hyon Myaeng
Joyce Jiyoung Whang
RALM
AAML
22
19
0
02 May 2023
Natural Language Deduction with Incomplete Information
Natural Language Deduction with Incomplete Information
Zayne Sprague
Kaj Bostrom
Swarat Chaudhuri
Greg Durrett
LRM
41
17
0
01 Nov 2022
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of
  Chain-of-Thought
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
116
275
0
03 Oct 2022
FOLIO: Natural Language Reasoning with First-Order Logic
FOLIO: Natural Language Reasoning with First-Order Logic
Simeng Han
Hailey Schoelkopf
Yilun Zhao
Zhenting Qi
Martin Riddell
...
Yingbo Zhou
Caiming Xiong
Rex Ying
Arman Cohan
Dragomir R. Radev
ReLM
LRM
26
91
0
02 Sep 2022
Penguins Don't Fly: Reasoning about Generics through Instantiations and
  Exceptions
Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions
Emily Allaway
Jena D. Hwang
Chandra Bhagavatula
Kathleen McKeown
Doug Downey
Yejin Choi
LRM
42
20
0
23 May 2022
On the Paradox of Learning to Reason from Data
On the Paradox of Learning to Reason from Data
Honghua Zhang
Liunian Harold Li
Tao Meng
Kai-Wei Chang
Guy Van den Broeck
NAI
ReLM
OOD
LRM
132
103
0
23 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,448
0
28 Jan 2022
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional
  Answers
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers
Haitian Sun
William W. Cohen
Ruslan Salakhutdinov
59
33
0
13 Oct 2021
RuleBert: Teaching Soft Rules to Pre-trained Language Models
RuleBert: Teaching Soft Rules to Pre-trained Language Models
Mohammed Saeed
N. Ahmadi
Preslav Nakov
Paolo Papotti
LRM
245
31
0
24 Sep 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,843
0
18 Apr 2021
Explaining Answers with Entailment Trees
Explaining Answers with Entailment Trees
Bhavana Dalvi
Peter Alexander Jansen
Oyvind Tafjord
Zhengnan Xie
Hannah Smith
Leighanna Pipatanangkura
Peter Clark
ReLM
FAtt
LRM
237
184
0
17 Apr 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
  Reasoning Strategies
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
245
671
0
06 Jan 2021
1