ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09261
  4. Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Annual Meeting of the Association for Computational Linguistics (ACL), 2022
17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
    ALMELMLRMReLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (554★)

Papers citing "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"

50 / 1,103 papers shown
LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization
LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization
Yuanchen Wu
Saurabh Verma
Justin Lee
Fangzhou Xiong
Poppy Zhang
Amel Awadelkarim
Xu Chen
Yubai Yuan
Shawndra Hill
205
3
0
10 Apr 2026
Efficient PRM Training Data Synthesis via Formal Verification
Efficient PRM Training Data Synthesis via Formal Verification
Ryo Kamoi
Yusen Zhang
Nan Zhang
Sarkar Snigdha Sarathi Das
Rui Zhang
Wenpeng Yin
Rui Zhang
LRM
359
2
0
10 Apr 2026
Attention-Aligned Reasoning for Large Language Models
Attention-Aligned Reasoning for Large Language Models
Hongxiang Zhang
Yuan Tian
Tianyi Zhang
AIFinLRM
204
1
0
30 Mar 2026
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Younes Hourri
Mohammad Mozaffari
M. Dehnavi
261
0
0
24 Dec 2025
ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning
ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning
Pritam Kadasi
Abhishek Upperwal
Mayank Singh
VLM
178
1
0
04 Dec 2025
AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees
AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees
Yangning Li
Shaoshen Chen
Yinghui Li
Yankai Chen
Hai-Tao Zheng
Hui Wang
Wenhao Jiang
Philip S. Yu
OffRL
227
5
0
04 Dec 2025
CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding
CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding
H. Ung
Guillaume Habault
Yasutaka Nishimura
Hao Niu
Roberto Legaspi
...
Ryoichi Kojima
Masato Taya
Chihiro Ono
A. Minamikawa
Y. Liu
221
0
0
03 Dec 2025
DETAIL Matters: Measuring the Impact of Prompt Specificity on Reasoning in Large Language Models
DETAIL Matters: Measuring the Impact of Prompt Specificity on Reasoning in Large Language Models
Olivia Kim
LRM
110
1
0
01 Dec 2025
Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models
Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models
Yujiao Yang
Jing Lian
Linhui Li
LRM
260
0
0
28 Nov 2025
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
X. S. Hu
Zhanchao Zhou
Ruiqi Liang
Zehuan Li
Wei Wu
Jianguo Li
333
1
0
28 Nov 2025
A Rosetta Stone for AI Benchmarks
A Rosetta Stone for AI Benchmarks
A. Ho
Jean-Stanislas Denain
David Atanasov
Samuel Albanie
Rohin Shah
ELM
329
5
0
28 Nov 2025
Revisiting Generalization Across Difficulty Levels: It's Not So Easy
Revisiting Generalization Across Difficulty Levels: It's Not So Easy
Yeganeh Kordi
Nihal V. Nayak
Max Zuo
Ilana Nguyen
Stephen H. Bach
265
2
0
26 Nov 2025
Structured Prompts Improve Evaluation of Language Models
Structured Prompts Improve Evaluation of Language Models
Asad Aali
Muhammad Ahmed Mohsin
Vasiliki Bikia
Arnav Singhvi
Richard Gaus
...
Sanmi Koyejo
Emily Alsentzer
Christopher Potts
N. Shah
Akshay Chaudhari
ELMLRM
341
1
0
25 Nov 2025
More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering
More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering
Duc Anh Vu
T. Nguyen
Cong-Duy Nguyen
Viet-Anh Nguyen
Anh Tuan Luu
FaMLLRM
424
0
0
25 Nov 2025
A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization
A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization
Ke Chen
Yifeng Wang
Hassan Almosapeeh
Haohan Wang
201
1
0
25 Nov 2025
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
Ziteng Sun
Adrian Benton
Samuel Kushnir
Asher Trockman
Vikas Singh
Suhas Diggavi
A. Suresh
MQ
208
0
0
24 Nov 2025
Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
Linye Wei
Wenjue Chen
Pingzhi Tang
Xiaotian Guo
Le Ye
Runsheng Wang
Meng Li
AI4CE
151
3
0
24 Nov 2025
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Yang Liu
Xiaolong Zhong
Ling Jiang
LLMAGMUMoELRM
424
0
0
23 Nov 2025
WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning
WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning
Haojin Yang
Rui Hu
Zequn Sun
Rui Zhou
Yujun Cai
Yiwei Wang
DiffM
153
1
0
22 Nov 2025
ELPO: Ensemble Learning Based Prompt Optimization for Large Language Models
ELPO: Ensemble Learning Based Prompt Optimization for Large Language Models
Qing Zhang
Bing Xu
X. R. Zhang
Yifan Shi
Yang Li
...
Ngai Wong
Yijie Chen
Hong Dai
X. Chen
M. Zhang
143
0
0
20 Nov 2025
Multimodal Evaluation of Russian-language Architectures
Multimodal Evaluation of Russian-language Architectures
Artem Chervyakov
Ulyana Isaeva
Anton A. Emelyanov
Artem Safin
Maria Tikhonova
...
Ilseyar Alimova
Ilseyar Alimova
A. Kapitanov
Alena Fenogenova
Alena Fenogenova
427
1
0
19 Nov 2025
Bootstrapping LLMs via Preference-Based Policy Optimization
Bootstrapping LLMs via Preference-Based Policy Optimization
Chen Jia
OffRL
424
0
0
17 Nov 2025
Uncertainty-Guided Checkpoint Selection for Reinforcement Finetuning of Large Language Models
Uncertainty-Guided Checkpoint Selection for Reinforcement Finetuning of Large Language Models
Manh Trong Nguyen
D. Nguyen
Dai Do
Svetha Venkatesh
Hung Le
194
0
0
13 Nov 2025
AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment
AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment
Ruibo Deng
Duanyu Feng
Wenqiang Lei
240
0
0
12 Nov 2025
LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging
LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging
Seungeon Lee
Soumi Das
Manish Gupta
Krishna P. Gummadi
ObjDMoMeAI4CE
690
1
0
10 Nov 2025
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Haofu Qian
Fangcheng Shi
Zijie Meng
...
Zheyong Xie
Zheyu Ye
Zhe Xu
Yao Hu
Shaosheng Cao
ALM
259
0
0
10 Nov 2025
C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning
C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning
Antonios Valkanas
Soumyasundar Pal
Pavel Rumiantsev
Yingxue Zhang
Mark Coates
247
2
0
10 Nov 2025
Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models
Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models
Boxuan Wang
Z. Li
Xinmiao Huang
Xiaowei Huang
Yi Dong
LRM
223
1
0
09 Nov 2025
Mixtures of SubExperts for Large Language Continual Learning
Mixtures of SubExperts for Large Language Continual Learning
Haeyong Kang
CLLKELMMoE
269
0
0
09 Nov 2025
Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models
Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models
Cong-Thanh Do
R. Doddipatla
Kate Knill
LRM
243
2
0
07 Nov 2025
Motif 2 12.7B technical report
Motif 2 12.7B technical report
Junghwan Lim
S. W. Lee
Dongseok Kim
Taehyun Kim
Eunhwan Park
...
Kungyu Lee
Dongpin Oh
Yeongjae Park
Bokki Ryu
Dongjoo Weon
159
0
0
07 Nov 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Jingqi Tong
Yurong Mou
Hangcheng Li
Mingzhe Li
Y. Yang
...
Y. Zheng
Xinchi Chen
Jun Zhao
Xuanjing Huang
Xipeng Qiu
VGenLRM
403
20
0
06 Nov 2025
Watermarking Discrete Diffusion Language Models
Watermarking Discrete Diffusion Language Models
Avi Bagchi
Akhil Bhimaraju
Moulik Choraria
Daniel Alabi
Lav Varshney
218
0
0
03 Nov 2025
FEval-TTC: Fair Evaluation Protocol for Test-Time Compute
FEval-TTC: Fair Evaluation Protocol for Test-Time Compute
Pavel Rumiantsev
Soumyasundar Pal
Yingxue Zhang
Mark Coates
128
1
0
03 Nov 2025
The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation
The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation
İbrahim Ethem Deveci
Duygu Ataman
ReLMALMELMLRM
277
4
0
03 Nov 2025
LongCat-Flash-Omni Technical Report
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLMVLM
665
17
0
31 Oct 2025
Consistency Training Helps Stop Sycophancy and Jailbreaks
Consistency Training Helps Stop Sycophancy and Jailbreaks
Alex Irpan
Alexander Matt Turner
Mark Kurzeja
David Elson
Rohin Shah
263
0
0
31 Oct 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
180
49
0
30 Oct 2025
Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4
Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4
Yuxin Li
Minghao Liu
Ruida Wang
Wenzhao Ji
Zhitao He
Rui Pan
J. Huang
Tong Zhang
Yi R. Fung
175
3
0
30 Oct 2025
Zero Reinforcement Learning Towards General Domains
Zero Reinforcement Learning Towards General Domains
Yuyuan Zeng
Yufei Huang
Can Xu
Qingfeng Sun
Jianfeng Yan
Guanghui Xu
Tao Yang
Fengzong Lian
OffRLReLMLRMAI4CE
189
2
0
29 Oct 2025
Parallel Loop Transformer for Efficient Test-Time Computation Scaling
Parallel Loop Transformer for Efficient Test-Time Computation Scaling
Bohong Wu
Mengzhao Chen
Xiang Luo
Shen Yan
Qifan Yu
...
Hongrui Zhan
Zheng Zhong
Xun Zhou
Siyuan Qiao
Xingyan Bin
181
7
0
28 Oct 2025
APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training
APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training
Jiarui Qin
Yunjia Xi
Junjie Huang
Renting Rui
D. Yin
Weiwen Liu
Yong Yu
W. Zhang
Xing Sun
160
1
0
28 Oct 2025
RiddleBench: A New Generative Reasoning Benchmark for LLMs
RiddleBench: A New Generative Reasoning Benchmark for LLMs
Deepon Halder
Alan Saji
Thanmay Jayakumar
Ratish Puduppully
Anoop Kunchukuttan
Raj Dabre
ReLMELMLRM
311
1
0
28 Oct 2025
A Survey on LLM Mid-Training
A Survey on LLM Mid-Training
Chengying Tu
Xuemiao Zhang
Rongxiang Weng
Rumei Li
Chen Zhang
Yang Bai
Hongfei Yan
Jingang Wang
Xunliang Cai
OffRLLRM
327
8
0
27 Oct 2025
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
Yixing Chen
Yiding Wang
Siqi Zhu
Haofei Yu
Tao Feng
Muhan Zhang
M. Patwary
Jiaxuan You
LLMAGLRM
363
19
0
27 Oct 2025
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection
Yusu Qian
Cheng Wan
Chao Jia
Yinfei Yang
Qingyu Zhao
Zhe Gan
LRMReLM
595
3
0
27 Oct 2025
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Jinzhe Liu
Junshu Sun
Shufan Shen
Chenxue Yang
Shuhui Wang
KELMCLL
430
3
0
25 Oct 2025
When Fewer Layers Break More Chains: Layer Pruning Harms Test-Time Scaling in LLMs
When Fewer Layers Break More Chains: Layer Pruning Harms Test-Time Scaling in LLMs
Keyu Wang
Tian Lyu
Guinan Su
Jonas Geiping
L. Yin
Marco Canini
Shiwei Liu
LRM
159
3
0
25 Oct 2025
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Sean McGregor
Victor Lu
Vassil Tashev
Armstrong Foundjem
Aishwarya Ramasethu
...
Chris Knotz
Kongtao Chen
Alicia Parrish
Anka Reuel
Heather Frase
186
2
0
24 Oct 2025
Chain of Execution Supervision Promotes General Reasoning in Large Language Models
Chain of Execution Supervision Promotes General Reasoning in Large Language Models
Nuo Chen
Zehua Li
Keqin Bao
Junyang Lin
Dayiheng Liu
LLMAGLRM
154
1
0
24 Oct 2025
1234...212223
Next
Page 1 of 23
Pageof 23