Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2210.09261
Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"
50 / 1,087 papers shown
Title
CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density
Daniel Kaiser
Arnoldo Frigessi
Ali Ramezani-Kebrya
Benjamin Ricaud
LRM
91
0
0
22 Sep 2025
Codifying Natural Langauge Tasks
Haoyang Chen
Kumiko Tanaka-Ishii
ELM
96
0
0
22 Sep 2025
Probabilistic Token Alignment for Large Language Model Fusion
Runjia Zeng
James Liang
Cheng Han
Zhiwen Cao
Jiahao Liu
...
Yingjie Victor Chen
Lifu Huang
Tong Geng
Qifan Wang
Dongfang Liu
108
1
0
21 Sep 2025
GPO: Learning from Critical Steps to Improve LLM Reasoning
Jiahao Yu
Zelei Cheng
Xian Wu
Xinyu Xing
LRM
135
2
0
19 Sep 2025
Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models
Tomoya Yamashita
Akira Ito
Yuuki Yamanaka
Masanori Yamada
Takayuki Miura
Toshiki Shibahara
MU
KELM
76
1
0
19 Sep 2025
CARGO: A Framework for Confidence-Aware Routing of Large Language Models
Amine Barrak
Yosr Fourati
Michael Olchawa
Emna Ksontini
Khalil Zoghlami
105
1
0
18 Sep 2025
Masked Diffusion Models as Energy Minimization
Sitong Chen
Shen Nie
Jiacheng Sun
Zijin Feng
Zhenguo Li
Ji-Rong Wen
Chongxuan Li
DiffM
OT
329
0
0
17 Sep 2025
Enhancing Multi-Agent Debate System Performance via Confidence Expression
Zijie Lin
Bryan Hooi
LLMAG
87
1
0
17 Sep 2025
Thinking in a Crowd: How Auxiliary Information Shapes LLM Reasoning
Haodong Zhao
Chenyan Zhao
Yansi Li
Zhuosheng Zhang
Gongshen Liu
LRM
64
1
0
17 Sep 2025
DSFT: Inspiring Diffusion Large Language Models to Comprehend Mathematical and Logical Patterns
Ranfei Chen
Ming Chen
DiffM
AI4CE
41
0
0
17 Sep 2025
ZERA: Zero-init Instruction Evolving Refinement Agent - From Zero Instructions to Structured Prompts via Principle-based Optimization
Seungyoun Yi
Minsoo Khang
Sungrae Park
LLMAG
44
0
0
17 Sep 2025
Instance-level Randomization: Toward More Stable LLM Evaluations
Yiyang Li
Y. Wu
Ying Luo
Liangtai Sun
Zishu Qin
Lin Qiu
Xuezhi Cao
Xunliang Cai
96
0
0
16 Sep 2025
Preservation of Language Understanding Capabilities in Speech-aware Large Language Models
Marek Kubis
Paweł Skórzewski
Iwona Christop
Mateusz Czyżnikiewicz
Jakub Kubiak
Łukasz Bondaruk
Marcin Lewandowski
AuLLM
ELM
142
0
0
15 Sep 2025
Harnessing Optimization Dynamics for Curvature-Informed Model Merging
Pouria Mahdavinia
Hamed Mahdavi
Niloofar Mireshghallah
M. Mahdavi
MoMe
143
0
0
14 Sep 2025
Fluid Language Model Benchmarking
Valentin Hofmann
David Heineman
Ian H. Magnusson
Kyle Lo
Jesse Dodge
Maarten Sap
Pang Wei Koh
Chun Wang
Hannaneh Hajishirzi
Noah A. Smith
89
6
0
14 Sep 2025
Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction
Yijun Liu
Yixuan Wang
Yuzhuang Xu
Shiyu Ji
Yang Xu
Qingfu Zhu
Wanxiang Che
108
0
0
13 Sep 2025
PaVeRL-SQL: Text-to-SQL via Partial-Match Rewards and Verbal Reinforcement Learning
Heng Hao
Wenjun Hu
Oxana Verkholyak
Davoud Ataee Tarzanagh
Baruch Gutow
Sima Didari
Masoud Faraki
H. Moon
Seungjai Min
78
0
0
08 Sep 2025
From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs
Jiaxiang Chen
Zhuo Wang
Mingxi Zou
Zhucong Li
Zhijian Zhou
Song Wang
Zenglin Xu
LRM
80
0
0
08 Sep 2025
Mitigating Spurious Correlations Between Question and Answer via Chain-of-Thought Correctness Perception Distillation
Hongyan Xie
Yitong Yao
Yikun Ban
Zixuan Huang
Deqing Wang
Zhenhe Wu
Haoxiang Su
Chao Wang
Shuangyong Song
LRM
147
2
0
06 Sep 2025
Symbolic Graphics Programming with Large Language Models
Yamei Chen
H. Zhang
Yangyi Huang
Zeju Qiu
Kaipeng Zhang
Yandong Wen
Weiyang Liu
115
1
0
05 Sep 2025
What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking
Yuan Sui
Yanming Zhang
Yi Liao
Yu Gu
Guohua Tang
Zhongqian Sun
Wei Yang
Xu Cheng
LLMAG
205
0
0
05 Sep 2025
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
Cheng Li
Jiexiong Liu
Yixuan Chen
Jie ji
MoE
62
0
0
05 Sep 2025
Characterizing Fitness Landscape Structures in Prompt Engineering
Arend Hintze
68
0
0
04 Sep 2025
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Yang Wang
Chenghao Xiao
Chia-Yi Hsiao
Zi Yan Chang
Chi-Li Chen
Tyler Loakman
Chenghua Lin
195
1
0
04 Sep 2025
Delta Activations: A Representation for Finetuned Large Language Models
Zhiqiu Xu
Amish Sethi
Mayur Naik
Ser-Nam Lim
122
0
0
04 Sep 2025
Why Language Models Hallucinate
Adam Tauman Kalai
Ofir Nachum
Santosh Vempala
Edwin Zhang
HILM
LRM
229
63
0
04 Sep 2025
IPA: An Information-Reconstructive Input Projection Framework for Efficient Foundation Model Adaptation
Yuan Yin
Shashanka Venkataramanan
Tuan-Hung Vu
Andrei Bursuc
Matthieu Cord
92
0
0
04 Sep 2025
Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning
Sugyeong Eo
Jungjun Lee
Chanjun Park
Heuiseok Lim
MoE
80
0
0
03 Sep 2025
Implicit Reasoning in Large Language Models: A Comprehensive Survey
Jindong Li
Yali Fu
Li Fan
Jiahong Liu
Yao Shu
Chengwei Qin
Menglin Yang
Irwin King
Rex Ying
OffRL
LRM
AI4CE
159
10
0
02 Sep 2025
Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
Naman D. Singh
Maximilian Müller
Francesco Croce
Matthias Hein
MU
KELM
CLL
171
4
0
02 Sep 2025
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic
Mohammad Zbeeb
Hasan Hammoud
Bernard Ghanem
LRM
148
0
0
01 Sep 2025
LongCat-Flash Technical Report
M-A-P Team
Bayan
Bei Li
Bingye Lei
Bo Wang
...
Rongxiang Weng
Ruichen Shao
Rumei Li
Shizhe Wu
Shuai Liang
MLLM
MoE
VLM
344
12
0
01 Sep 2025
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Yanxiao Zhao
Yaqian Li
Zihao Bo
Rinyoichi Takezoe
Haojia Hui
Mo Guang
Lei Ren
Xiaolin Qin
Kaiwen Long
LRM
82
0
0
31 Aug 2025
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Yi Liao
Yu Gu
Yuan Sui
Zining Zhu
Yifan Lu
Guohua Tang
Zhongqian Sun
Wei Yang
OffRL
ReLM
LM&Ro
LRM
121
1
0
29 Aug 2025
Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions
Haoze Wu
Cheng Wang
Wenshuo Zhao
Junxian He
OffRL
97
3
0
28 Aug 2025
TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning
Simin Ma
Shujian Liu
Jun Tan
Yebowen Hu
Song Wang
Sathish Indurthi
Sanqiang Zhao
Liwei Wu
Jianbing Han
Kaiqiang Song
64
0
0
28 Aug 2025
Symphony: A Decentralized Multi-Agent Framework for Scalable Collective Intelligence
Ji Wang
Kashing Chen
Xinyuan Song
Ke Zhang
Lynn Ai
Eric Yang
Bill Shi
49
0
0
27 Aug 2025
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Zihao Huang
Yu Bao
Qiyang Min
S. Chen
Ran Guo
...
Defa Zhu
Yutao Zeng
Banggu Wu
Xun Zhou
Siyuan Qiao
MoE
132
1
0
26 Aug 2025
Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap
Jun Wang
Ninglun Gu
Kailai Zhang
Zijiao Zhang
Yelun Bao
...
Liwei Liu
Yihuan Liu
Pengyong Li
Gary G. Yen
Junchi Yan
ALM
ELM
168
0
0
26 Aug 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang
Zhangwei Gao
Lixin Gu
Hengjun Pu
Long Cui
...
Bowen Zhou
Kai Chen
Yu Qiao
Wenhai Wang
Gen Luo
MLLM
LRM
230
193
0
25 Aug 2025
UniAPO: Unified Multimodal Automated Prompt Optimization
Qipeng Zhu
Yanzhe Chen
Huasong Zhong
Yan Li
Jie Chen
Zhixin Zhang
Junping Zhang
Zhenheng Yang
LLMAG
97
1
0
25 Aug 2025
LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions
Maojia Song
Tej Deep Pala
Weisheng Jin
Amir Zadeh
Chuan Li
Dorien Herremans
Soujanya Poria
Soujanya Poria
LLMAG
117
3
0
24 Aug 2025
CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency
Zhanming Shen
Hao Chen
Yulei Tang
Shaolin Zhu
Wentao Ye
Xiaomeng Hu
Haobo Wang
Gang Chen
Junbo Zhao
SyDa
ALM
88
0
0
22 Aug 2025
Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective
Tianyao Shi
Yi Ding
MQ
102
3
0
22 Aug 2025
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis
Yufeng Zhao
Junnan Liu
Hongwei Liu
D. Zhu
Yuan Shen
Songyang Zhang
Kai Chen
LRM
84
0
0
21 Aug 2025
Dream 7B: Diffusion Large Language Models
Jiacheng Ye
Zhihui Xie
Lin Zheng
Lei Li
Zirui Wu
Xin Jiang
Zhenguo Li
Lingpeng Kong
DiffM
VLM
540
91
0
21 Aug 2025
In-Context Iterative Policy Improvement for Dynamic Manipulation
Mark Van der Merwe
Devesh Jha
LM&Ro
OffRL
LRM
96
0
0
20 Aug 2025
ReaLM: Reflection-Enhanced Autonomous Reasoning with Small Language Models
Yuanfeng Xu
Zehui Dai
Jian Liang
Jiapeng Guan
Guangrun Wang
Liang Lin
Xiaohui Lv
LLMAG
LRM
96
0
0
17 Aug 2025
ZigzagAttention: Efficient Long-Context Inference with Exclusive Retrieval and Streaming Heads
Zhuorui Liu
Chen Zhang
Dawei Song
36
1
0
17 Aug 2025
Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets
Benjamin Pikus
Pratyush Ranjan Tiwari
Burton Ye
208
4
0
15 Aug 2025
Previous
1
2
3
4
5
6
...
20
21
22
Next