ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.09404
  4. Cited By
AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability
v1v2 (latest)

AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability

14 February 2024
Siwei Yang
Bingchen Zhao
Cihang Xie
    LRM
ArXiv (abs)PDFHTMLGithub (4★)

Papers citing "AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability"

7 / 7 papers shown
DEVAL: A Framework for Evaluating and Improving the Derivation Capability of Large Language Models
DEVAL: A Framework for Evaluating and Improving the Derivation Capability of Large Language Models
Y. Li
Qin Li
Min Zhang
Min Zhang
LRM
269
0
0
18 Nov 2025
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
Siyang Song
Xinpeng Wang
Guangyao Zhai
Nassir Navab
Yun Xue
LLMAG
270
6
0
22 Mar 2025
Understanding When Tree of Thoughts Succeeds: Larger Models Excel in
  Generation, Not Discrimination
Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination
Qiqi Chen
Xinpeng Wang
Philipp Mondorf
Michael A. Hedderich
Yun Xue
LRMAI4CE
346
1
0
23 Oct 2024
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon
  Agent Tasks with Large Language Model
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Mengkang Hu
Tianxing Chen
Qiguang Chen
Yao Mu
Wenqi Shao
Ping Luo
LM&RoLLMAGRALM
345
49
0
18 Aug 2024
When Reasoning Meets Information Aggregation: A Case Study with Sports
  Narratives
When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives
Yebowen Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
Wenlin Yao
H. Foroosh
Dong Yu
Fei Liu
255
9
0
17 Jun 2024
EconLogicQA: A Question-Answering Benchmark for Evaluating Large
  Language Models in Economic Sequential Reasoning
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yinzhu Quan
Zefang Liu
265
20
0
13 May 2024
Large AI Model-Based Semantic Communications
Large AI Model-Based Semantic CommunicationsIEEE wireless communications (IEEE Wireless Commun.), 2023
Feibo Jiang
Yubo Peng
Li Dong
Kezhi Wang
Kun Yang
Cunhua Pan
Xiaohu You
266
123
0
07 Jul 2023
1
Page 1 of 1