Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Annual Meeting of the Association for Computational Linguistics (ACL), 2022

17 October 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"

50 / 1,103 papers shown

UniAPO: Unified Multimodal Automated Prompt Optimization

143

25 Aug 2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

...

305

279

25 Aug 2025

LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions

172

24 Aug 2025

CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency

109

22 Aug 2025

Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective

Tianyao Shi

Yi Ding

133

22 Aug 2025

Dream 7B: Diffusion Large Language Models

1.0K

110

21 Aug 2025

Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis

133

21 Aug 2025

In-Context Iterative Policy Improvement for Dynamic Manipulation

Mark Van der Merwe

Devesh Jha

LM&Ro OffRL LRM

131

20 Aug 2025

ZigzagAttention: Efficient Long-Context Inference with Exclusive Retrieval and Streaming Heads

Zhuorui Liu

Chen Zhang

Dawei Song

17 Aug 2025

ReaLM: Reflection-Enhanced Autonomous Reasoning with Small Language Models

140

17 Aug 2025

Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets

Benjamin Pikus

Pratyush Ranjan Tiwari

Burton Ye

307

15 Aug 2025

Slow Tuning and Low-Entropy Masking for Safe Chain-of-Thought Distillation

123

13 Aug 2025

mSCoRe: a

M

ultilingual and Scalable Benchmark for

S

kill-based

Co

mmonsense

Re

190

13 Aug 2025

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

...

176

12 Aug 2025

GreenTEA: Gradient Descent with Topic-modeling and Evolutionary Auto-prompting

Zheng Dong

Luming Shang

Gabriela Olinto

107

12 Aug 2025

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

...

137

11 Aug 2025

DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment

262

08 Aug 2025

TASE: Token Awareness and Structured Evaluation for Multilingual Language Models

115

07 Aug 2025

Align, Don't Divide: Revisiting the LoRA Architecture in Multi-Task Learning

07 Aug 2025

Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?

114

07 Aug 2025

R-Zero: Self-Evolving Reasoning LLM from Zero Data

233

07 Aug 2025

IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards

276

06 Aug 2025

Tensorized Clustered LoRA Merging for Multi-Task Interference

182

06 Aug 2025

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

...

151

05 Aug 2025

ProCut: LLM Prompt Compression via Attribution Estimation

179

04 Aug 2025

SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference

215

03 Aug 2025

LinkQA: Synthesizing Diverse QA from Multiple Seeds Strongly Linked by Knowledge Points

212

02 Aug 2025

Large-Scale Diverse Synthesis for Mid-Training

151

02 Aug 2025

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

...

185

01 Aug 2025

Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models

170

01 Aug 2025

Learning Like Humans: Resource-Efficient Federated Fine-Tuning through Cognitive Developmental Stages

184

31 Jul 2025

DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System

Hui Yi Leong

Yuqing Wu

169

31 Jul 2025

Doctor Sun: A Bilingual Multimodal Large Language Model for Biomedical AI

328

30 Jul 2025

Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

169

30 Jul 2025

Kimi K2: Open Agentic Intelligence

...

179

28 Jul 2025

Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning

171

25 Jul 2025

Towards Effective Human-in-the-Loop Assistive AI Agents

169

24 Jul 2025

Technical Report of TeleChat2, TeleChat2.5 and T1

...

428

24 Jul 2025

Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling

...

243

24 Jul 2025

Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models

386

23 Jul 2025

WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training

259

23 Jul 2025

Are LLM Belief Updates Consistent with Bayes' Theorem?

168

23 Jul 2025

Towards Compute-Optimal Many-Shot In-Context Learning

213

22 Jul 2025

A Unifying Scheme for Extractive Content Selection Tasks

144

22 Jul 2025

Metric assessment protocol in the context of answer fluctuation on MCQ tasks

129

21 Jul 2025

Quantum Machine Learning in Multi-Qubit Phase-Space Part I: Foundations

315

16 Jul 2025

A Survey of Deep Learning for Geometry Problem Solving

Jianzhe Ma

Wenxuan Wang

Qin Jin

444

16 Jul 2025

Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning

157

14 Jul 2025

RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services

...

219

13 Jul 2025

DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models

216

12 Jul 2025