ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09261
  4. Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Annual Meeting of the Association for Computational Linguistics (ACL), 2022
17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
    ALMELMLRMReLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"

50 / 1,089 papers shown
Title
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models
Zhiyuan Li
Yi-Ju Chang
Yuan Wu
LLMAGLRM
165
6
0
28 May 2025
Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data
Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data
Christopher Lee Lübbers
129
0
0
28 May 2025
LASER: Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
LASER: Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
Paramita Mirza
Lucas Weber
Fabian Küch
224
0
0
28 May 2025
Uncertainty Unveiled: Can Exposure to More In-context Examples Mitigate Uncertainty for Large Language Models?
Uncertainty Unveiled: Can Exposure to More In-context Examples Mitigate Uncertainty for Large Language Models?Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yifei Wang
Yu Sheng
Linjing Li
D. Zeng
157
0
0
27 May 2025
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
Yongchao Chen
Y. Liu
Junwei Zhou
Yilun Hao
Jingquan Wang
Yang Zhang
Chuchu Fan
Chuchu Fan
OffRLReLMAI4TSSyDaALMLRM
277
4
0
27 May 2025
Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History
Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History
Qishuai Zhong
Zongmin Li
Siqi Fan
Aixin Sun
243
2
0
27 May 2025
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
Peiran Wang
Ye Yu
Kai Wei
Haojing Luo
Haohan Wang
210
1
0
26 May 2025
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
Pengxiang Li
Shilin Yan
Joey Tsai
Renrui Zhang
Ruichuan An
Ziyu Guo
Xiaowei Gao
174
10
0
26 May 2025
THiNK: Can Large Language Models Think-aloud?
THiNK: Can Large Language Models Think-aloud?
Yongan Yu
Mengqian Wu
Yiran Lin
Nikki G. Lobczowski
LLMAGLRMELM
138
1
0
26 May 2025
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
Junteng Liu
Yuanxiang Fan
Z. L. Jiang
Han Ding
Yongyi Hu
...
Yunan Huang
Mozhi Zhang
Pengyu Zhao
Junjie Yan
Junxian He
OffRLNAISyDaLRMELM
296
18
0
26 May 2025
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants
Yiqun Zhang
Hao Li
Chenxu Wang
L. Chen
Qiaosheng Zhang
...
Xinrun Wang
Jia Xu
Mengwei He
Xuming He
Shuyue Hu
356
13
0
26 May 2025
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
Lachlan McGinness
Peter Baumgartner
ReLMLRMELM
419
1
0
26 May 2025
ARM: Adaptive Reasoning Model
ARM: Adaptive Reasoning Model
Siye Wu
Jian Xie
Yikai Zhang
Aili Chen
Kai Zhang
Yu Su
Yanghua Xiao
LRM
214
11
0
26 May 2025
Efficient Data Selection at Scale via Influence Distillation
Efficient Data Selection at Scale via Influence Distillation
Mahdi Nikdan
Vincent Cohen-Addad
Dan Alistarh
Vahab Mirrokni
TDI
289
3
0
25 May 2025
RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
Wenhao Liu
Wenhao Liu
Mingchen Xie
Jingwen Xu
Zisu Huang
...
Changze Lv
He-Da Wang
Qi Zhang
Xiaoqing Zheng
Xuanjing Huang
397
1
0
25 May 2025
Multilingual Question Answering in Low-Resource Settings: A Dzongkha-English Benchmark for Foundation Models
Multilingual Question Answering in Low-Resource Settings: A Dzongkha-English Benchmark for Foundation Models
Md. Tanzib Hosain
Rajan Das Gupta
Md. Kishor Morol
162
2
0
24 May 2025
MAPLE: Many-Shot Adaptive Pseudo-Labeling for In-Context Learning
MAPLE: Many-Shot Adaptive Pseudo-Labeling for In-Context Learning
Zihan Chen
Song Wang
Zhen Tan
Jundong Li
Cong Shen
OffRL
442
8
0
22 May 2025
SPaRC: A Spatial Pathfinding Reasoning Challenge
SPaRC: A Spatial Pathfinding Reasoning Challenge
Lars Benedikt Kaesberg
Jan Philip Wahle
Terry Ruas
Bela Gipp
LRM
281
1
0
22 May 2025
Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Xinghao Chen
Anhao Zhao
Heming Xia
Xuan Lu
Hanlin Wang
Yanjun Chen
Wei Zhang
Jian Wang
W. Li
Xiaoyu Shen
ReLMLRM
353
15
0
22 May 2025
Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?
Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?
Jin Jiang
Jianing Wang
Yuchen Yan
Yang Liu
J. Zhu
Mengdi Zhang
Xunliang Cai
Liangcai Gao
ELMLRM
150
4
0
22 May 2025
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models
Chenzhuo Zhao
Ziqian Liu
Xingda Wang
Junting Lu
Chaoyi Ruan
296
2
0
22 May 2025
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging
Chongjie Si
Kangtao Lv
Jingjing Jiang
Yadao Wang
Yongwei Wang
Xiaokang Yang
Yuchi Xu
Bo Zheng
Wei Shen
MoMe
224
1
0
22 May 2025
INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling
INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling
Haochen Shi
Tianshi Zheng
Weiqi Wang
Baixuan Xu
Chunyang Li
Chunkit Chan
Tao Fan
Yangqiu Song
Qiang Yang
238
5
0
22 May 2025
Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning
Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning
Yukun Zhao
Lingyong Yan
Zhenyang Li
Shuaiqiang Wang
Zhumin Chen
Zhaochun Ren
Dawei Yin
CLLKELMVLMLRM
214
0
0
21 May 2025
TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games
TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games
Yuan Yuan
Muyu He
Muhammad Adil Shahid
Jiani Huang
Ziyang Li
Li Zhang
LRM
141
0
0
21 May 2025
ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy
ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy
Gengyang Li
Yifeng Gao
Yuming Li
Yunfang Wu
ReLMOffRLLRM
360
13
0
21 May 2025
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Yuchen Yan
Jin Jiang
Zhenbang Ren
Yijun Li
Xudong Cai
...
Mengdi Zhang
Jian Shao
Yongliang Shen
Jun Xiao
Yueting Zhuang
OffRLALMLRM
340
8
0
21 May 2025
Generalizable Process Reward Models via Formally Verified Training Data
Generalizable Process Reward Models via Formally Verified Training Data
Ryo Kamoi
Yusen Zhang
Nan Zhang
Sarkar Snigdha Sarathi Das
Rui Zhang
OffRLLRM
228
2
0
21 May 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Tencent Hunyuan Team
Ao Liu
Botong Zhou
Can Xu
Chayse Zhou
...
Bingxin Qu
Bolin Ni
Boyu Wu
Chen Li
Cheng-peng Jiang
MoELRMAI4CE
387
13
0
21 May 2025
Model Merging is Secretly Certifiable: Non-Vacuous Generalisation Bounds for Low-Shot Learning
Model Merging is Secretly Certifiable: Non-Vacuous Generalisation Bounds for Low-Shot Learning
Taehoon Kim
Henry Gouk
Minyoung Kim
Timothy M. Hospedales
304
0
0
21 May 2025
Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory
Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory
Hongli Zhou
Hui Huang
Ziqing Zhao
Lvyuan Han
Huicheng Wang
...
Jian Dong
Bing Xu
Conghui Zhu
Hailong Cao
Tiejun Zhao
ALM
209
5
0
21 May 2025
Enhancing LLMs via High-Knowledge Data Selection
Enhancing LLMs via High-Knowledge Data SelectionAAAI Conference on Artificial Intelligence (AAAI), 2025
Feiyu Duan
Xuemiao Zhang
Sirui Wang
Haoran Que
Yuqi Liu
Wenge Rong
Xunliang Cai
468
3
0
20 May 2025
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
Yanggan Gu
Zhaoyi Yan
Yuanyi Wang
Yiming Zhang
Qi Zhou
Leilei Gan
Hongxia Yang
215
2
0
20 May 2025
DecIF: Improving Instruction-Following through Meta-Decomposition
DecIF: Improving Instruction-Following through Meta-Decomposition
Tingfeng Hui
Pengyu Zhu
Bowen Ping
Ling Tang
Guanting Dong
Yaqi Zhang
Sen Su
204
2
0
20 May 2025
Incorporating Token Usage into Prompting Strategy Evaluation
Incorporating Token Usage into Prompting Strategy Evaluation
Chris Sypherd
Sergei Petrov
Sonny George
Vaishak Belle
LLMAG
148
0
0
20 May 2025
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning CatalystAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Hongru Wang
Deng Cai
Wanjun Zhong
Shijue Huang
Jeff Z. Pan
Zeming Liu
Kam-Fai Wong
ReLMLRM
209
9
0
20 May 2025
General-Reasoner: Advancing LLM Reasoning Across All Domains
General-Reasoner: Advancing LLM Reasoning Across All Domains
Xueguang Ma
Qian Liu
Dongfu Jiang
Ge Zhang
Tianhao Shen
Wenhu Chen
AI4CELRM
390
66
0
20 May 2025
Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis
Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis
Haoming Huang
Yibo Yan
Jiahao Huo
Xin Zou
Xinfeng Li
Kun Wang
Xuming Hu
461
1
0
20 May 2025
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
ELMLRM
472
6
0
19 May 2025
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
Chenlin Ming
Chendi Qu
Mengzhang Cai
Qizhi Pei
Zhuoshi Pan
Yu Li
Xiaoming Duan
Lijun Wu
Bin Wang
165
2
0
19 May 2025
R3: Robust Rubric-Agnostic Reward Models
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
510
12
0
19 May 2025
ProDS: Preference-oriented Data Selection for Instruction Tuning
ProDS: Preference-oriented Data Selection for Instruction Tuning
Wenya Guo
Zhengkun Zhang
Xumeng Liu
Ying Zhang
Ziyu Lu
Haoze Zhu
Xubo Liu
Ruxue Yan
252
0
0
19 May 2025
Do different prompting methods yield a common task representation in language models?
Do different prompting methods yield a common task representation in language models?
Guy Davidson
Todd M. Gureckis
Brenden M. Lake
Adina Williams
331
4
0
17 May 2025
LLM-BABYBENCH: Understanding and Evaluating Grounded Planning and Reasoning in LLMs
LLM-BABYBENCH: Understanding and Evaluating Grounded Planning and Reasoning in LLMs
Omar Choukrani
Idriss Malek
Daniil Orel
Zhuohan Xie
Zangir Iklassov
Martin Takáč
Salem Lahlou
LLMAGELMLRM
177
2
0
17 May 2025
Critique-Guided Distillation for Efficient and Robust Language Model Reasoning
Critique-Guided Distillation for Efficient and Robust Language Model Reasoning
Berkcan Kapusuzoglu
Supriyo Chakraborty
Chia-Hsuan Lee
Sambit Sahu
394
0
0
16 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Accurate KV Cache Quantization with Outlier Tokens TracingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yi Su
Yuechi Zhou
Quantong Qiu
Jilong Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
291
5
0
16 May 2025
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Huashan Sun
Shengyi Liao
Yansen Han
Yu Bai
Yang Gao
...
Weizhou Shen
Fanqi Wan
Ming Yan
J.N. Zhang
Fei Huang
523
1
0
16 May 2025
A Systematic Analysis of Base Model Choice for Reward Modeling
A Systematic Analysis of Base Model Choice for Reward Modeling
Kian Ahrabian
Pegah Jandaghi
Negar Mokhberian
Sai Praneeth Karimireddy
Jay Pujara
290
0
0
16 May 2025
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge SubtractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Mohammadtaha Bagherifard
Sahar Rajabi
Ali Edalat
Yadollah Yaghoobzadeh
KELM
250
0
0
16 May 2025
Rethinking Prompt Optimizers: From Prompt Merits to Optimization
Rethinking Prompt Optimizers: From Prompt Merits to Optimization
Zixiao Zhu
Hanzhang Zhou
Zijian Feng
Tianjiao Li
Chua Jia Jim Deryl
Mak Lee Onn
Gee Wah Ng
Kezhi Mao
LRM
331
1
0
15 May 2025
Previous
123...567...202122
Next