Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2210.09261
Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"
50 / 1,087 papers shown
Title
Revisiting Generalization Across Difficulty Levels: It's Not So Easy
Yeganeh Kordi
Nihal V. Nayak
Max Zuo
Ilana Nguyen
Stephen H. Bach
54
0
0
26 Nov 2025
More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering
Duc Anh Vu
T. Nguyen
Cong-Duy Nguyen
Viet-Anh Nguyen
Anh Tuan Luu
FaML
LRM
198
0
0
25 Nov 2025
A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization
Ke Chen
Yifeng Wang
Hassan Almosapeeh
Haohan Wang
80
0
0
25 Nov 2025
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
Ziteng Sun
Adrian Benton
Samuel Kushnir
Asher Trockman
Vikas Singh
Suhas Diggavi
A. Suresh
MQ
61
0
0
24 Nov 2025
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Yang Liu
Xiaolong Zhong
Ling Jiang
LLMAG
MU
MoE
LRM
260
0
0
23 Nov 2025
ELPO: Ensemble Learning Based Prompt Optimization for Large Language Models
Qing Zhang
Bing Xu
X. R. Zhang
Yifan Shi
Yang Li
...
Ngai Wong
Yijie Chen
Hong Dai
X. Chen
M. Zhang
44
0
0
20 Nov 2025
Multimodal Evaluation of Russian-language Architectures
Artem Chervyakov
Ulyana Isaeva
Anton A. Emelyanov
Artem Safin
Maria Tikhonova
...
Ilseyar Alimova
Ilseyar Alimova
A. Kapitanov
Alena Fenogenova
Alena Fenogenova
166
1
0
19 Nov 2025
Bootstrapping LLMs via Preference-Based Policy Optimization
Chen Jia
OffRL
168
0
0
17 Nov 2025
Uncertainty-Guided Checkpoint Selection for Reinforcement Finetuning of Large Language Models
Manh Trong Nguyen
D. Nguyen
Dai Do
Svetha Venkatesh
Hung Le
68
0
0
13 Nov 2025
AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment
Ruibo Deng
Duanyu Feng
Wenqiang Lei
119
0
0
12 Nov 2025
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Haofu Qian
Fangcheng Shi
Zijie Meng
...
Zheyong Xie
Zheyu Ye
Zhe Xu
Yao Hu
Shaosheng Cao
ALM
135
0
0
10 Nov 2025
C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning
Antonios Valkanas
Soumyasundar Pal
Pavel Rumiantsev
Yingxue Zhang
Mark Coates
76
0
0
10 Nov 2025
LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging
Seungeon Lee
Soumi Das
Manish Gupta
Krishna P. Gummadi
MoMe
392
1
0
10 Nov 2025
Chasing Consistency: Quantifying and Optimizing Human-Model Alignment in Chain-of-Thought Reasoning
Boxuan Wang
Z. Li
Xinmiao Huang
Xiaowei Huang
Yi Dong
LRM
40
0
0
09 Nov 2025
Mixtures of SubExperts for Large Language Continual Learning
Haeyong Kang
CLL
KELM
MoE
131
0
0
09 Nov 2025
Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models
Cong-Thanh Do
R. Doddipatla
Kate Knill
LRM
192
0
0
07 Nov 2025
Motif 2 12.7B technical report
Junghwan Lim
S. W. Lee
Dongseok Kim
Taehyun Kim
Eunhwan Park
...
Kungyu Lee
Dongpin Oh
Yeongjae Park
Bokki Ryu
Dongjoo Weon
32
0
0
07 Nov 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Jingqi Tong
Yurong Mou
Hangcheng Li
Mingzhe Li
Y. Yang
...
Y. Zheng
Xinchi Chen
Jun Zhao
Xuanjing Huang
Xipeng Qiu
VGen
LRM
277
6
0
06 Nov 2025
Watermarking Discrete Diffusion Language Models
Avi Bagchi
Akhil Bhimaraju
Moulik Choraria
Daniel Alabi
Lav Varshney
68
0
0
03 Nov 2025
FEval-TTC: Fair Evaluation Protocol for Test-Time Compute
Pavel Rumiantsev
Soumyasundar Pal
Yingxue Zhang
Mark Coates
84
0
0
03 Nov 2025
The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation
İbrahim Ethem Deveci
Duygu Ataman
ReLM
ALM
ELM
LRM
131
0
0
03 Nov 2025
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLM
VLM
370
1
0
31 Oct 2025
Consistency Training Helps Stop Sycophancy and Jailbreaks
Alex Irpan
Alexander Matt Turner
Mark Kurzeja
David Elson
Rohin Shah
179
0
0
31 Oct 2025
Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4
Yuxin Li
Minghao Liu
Ruida Wang
Wenzhao Ji
Zhitao He
Rui Pan
J. Huang
Tong Zhang
Yi R. Fung
77
0
0
30 Oct 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
88
2
0
30 Oct 2025
Zero Reinforcement Learning Towards General Domains
Yuyuan Zeng
Yufei Huang
Can Xu
Qingfeng Sun
Jianfeng Yan
Guanghui Xu
Tao Yang
Fengzong Lian
OffRL
ReLM
LRM
AI4CE
121
0
0
29 Oct 2025
APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training
Jiarui Qin
Yunjia Xi
Junjie Huang
Renting Rui
D. Yin
Weiwen Liu
Yong Yu
W. Zhang
Xing Sun
64
0
0
28 Oct 2025
Parallel Loop Transformer for Efficient Test-Time Computation Scaling
Bohong Wu
Mengzhao Chen
Xiang Luo
Shen Yan
Qifan Yu
...
Hongrui Zhan
Zheng Zhong
Xun Zhou
Siyuan Qiao
Xingyan Bin
88
2
0
28 Oct 2025
RiddleBench: A New Generative Reasoning Benchmark for LLMs
Deepon Halder
Alan Saji
Thanmay Jayakumar
Ratish Puduppully
Anoop Kunchukuttan
Raj Dabre
ReLM
ELM
LRM
206
0
0
28 Oct 2025
A Survey on LLM Mid-Training
Chengying Tu
Xuemiao Zhang
Rongxiang Weng
Rumei Li
Chen Zhang
Yang Bai
Hongfei Yan
Jingang Wang
Xunliang Cai
OffRL
LRM
173
0
0
27 Oct 2025
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
Yixing Chen
Yiding Wang
Siqi Zhu
Haofei Yu
Tao Feng
Muhan Zhang
M. Patwary
Jiaxuan You
LLMAG
LRM
251
4
0
27 Oct 2025
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection
Yusu Qian
Cheng Wan
Chao Jia
Yinfei Yang
Qingyu Zhao
Zhe Gan
LRM
ReLM
346
1
0
27 Oct 2025
When Fewer Layers Break More Chains: Layer Pruning Harms Test-Time Scaling in LLMs
Keyu Wang
Tian Lyu
Guinan Su
Jonas Geiping
L. Yin
Marco Canini
Shiwei Liu
LRM
89
0
0
25 Oct 2025
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Jinzhe Liu
Junshu Sun
Shufan Shen
Chenxue Yang
Shuhui Wang
KELM
CLL
257
1
0
25 Oct 2025
Chain of Execution Supervision Promotes General Reasoning in Large Language Models
Nuo Chen
Zehua Li
Keqin Bao
Junyang Lin
Dayiheng Liu
LLMAG
LRM
66
0
0
24 Oct 2025
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Sean McGregor
Victor Lu
Vassil Tashev
Armstrong Foundjem
Aishwarya Ramasethu
...
Chris Knotz
Kongtao Chen
Alicia Parrish
Anka Reuel
Heather Frase
101
0
0
24 Oct 2025
LM-mixup: Text Data Augmentation via Language Model based Mixup
Zhijie Deng
Zhouan Shen
Ling Li
Yao Zhou
Zhaowei Zhu
Yanji He
Wei Wang
Jiaheng Wei
56
0
0
23 Oct 2025
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
Jiawei Zhang
Andrew Estornell
David D. Baek
B. Li
Xiaojun Xu
108
0
0
20 Oct 2025
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Tong Chen
Akari Asai
Luke Zettlemoyer
Hannaneh Hajishirzi
Faeze Brahman
OffRL
HILM
LRM
121
0
0
20 Oct 2025
Mapping Post-Training Forgetting in Language Models at Scale
Jackson Harmon
Andreas Hochlehnert
Matthias Bethge
Ameya Prabhu
CLL
KELM
77
0
0
20 Oct 2025
Automatic Prompt Generation via Adaptive Selection of Prompting Techniques
Yohei Ikenoue
Hitomi Tashiro
Shigeru Kuroyanagi
48
0
0
20 Oct 2025
Navigating the Alignment-Calibration Trade-off: A Pareto-Superior Frontier via Model Merging
Tiancheng Hu
Benjamin Minixhofer
Nigel Collier
MoMe
410
1
0
20 Oct 2025
Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense
Zhehao Zhang
Weijie Xu
Shixian Cui
Chandan K. Reddy
AAML
LRM
76
0
0
17 Oct 2025
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
Heecheol Yun
Kwangmin Ki
J. H. Lee
Eunho Yang
65
0
0
17 Oct 2025
SimKO: Simple Pass@K Policy Optimization
Ruotian Peng
Yi Ren
Zhouliang Yu
Weiyang Liu
Yandong Wen
156
2
0
16 Oct 2025
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models
Kedi Chen
Zhikai Lei
Xu Guo
Xuecheng Wu
Siyuan Zeng
...
J. Zhou
Liang He
Qipeng Guo
Kai Chen
Wei-na Zhang
AIMat
AI4TS
LRM
199
0
0
16 Oct 2025
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Xiaoshu Chen
Sihang Zhou
Ke Liang
Duanyang Yuan
Haoyuan Chen
Xiaoyu Sun
Linyuan Meng
Xinwang Liu
ReLM
LRM
157
0
0
15 Oct 2025
NOSA: Native and Offloadable Sparse Attention
Yuxiang Huang
Chaojun Xiao
Xu Han
Zhiyuan Liu
MQ
128
0
0
15 Oct 2025
GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models
Chen Zheng
Y. Cai
Deyi Liu
Jin Ma
Yiyuan Ma
Y. Yang
Jing Liu
Yutao Zeng
Xun Zhou
Siyuan Qiao
MoE
104
0
0
15 Oct 2025
LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization
Yuanchen Wu
Saurabh Verma
Justin Lee
Fangzhou Xiong
Poppy Zhang
Amel Awadelkarim
Xu Chen
Yubai Yuan
Shawndra Hill
49
0
0
14 Oct 2025
1
2
3
4
...
20
21
22
Next