Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.09404
Cited By
AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability
14 February 2024
Siwei Yang
Bingchen Zhao
Cihang Xie
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability"
16 / 16 papers shown
Title
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
S.
Xinpeng Wang
Guangyao Zhai
Nassir Navab
Barbara Plank
LLMAG
51
0
0
22 Mar 2025
Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination
Qiqi Chen
Xinpeng Wang
Philipp Mondorf
Michael A. Hedderich
Barbara Plank
LRM
AI4CE
21
1
0
23 Oct 2024
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model
Mengkang Hu
Tianxing Chen
Qiguang Chen
Yao Mu
Wenqi Shao
Ping Luo
LM&Ro
LLMAG
RALM
29
3
0
18 Aug 2024
When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives
Yebowen Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
Wenlin Yao
H. Foroosh
Dong Yu
Fei Liu
35
6
0
17 Jun 2024
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
Yinzhu Quan
Zefang Liu
27
6
0
13 May 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
139
304
0
05 Jan 2024
MindAgent: Emergent Gaming Interaction
Ran Gong
Qiuyuan Huang
Xiaojian Ma
Hoi Vo
Zane Durante
...
Zilong Zheng
Song-Chun Zhu
Demetri Terzopoulos
Fei-Fei Li
Jianfeng Gao
LM&Ro
99
62
0
18 Sep 2023
Large AI Model-Based Semantic Communications
Feibo Jiang
Yubo Peng
Li Dong
Kezhi Wang
Kun Yang
Cunhua Pan
Xiaohu You
25
47
0
07 Jul 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
215
1,727
0
07 Apr 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,470
0
06 Oct 2022
Complexity-Based Prompting for Multi-Step Reasoning
Yao Fu
Hao-Chun Peng
Ashish Sabharwal
Peter Clark
Tushar Khot
ReLM
LRM
162
411
0
03 Oct 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
4,048
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,217
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
194
623
0
20 May 2021
1