Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.09261
Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"
50 / 788 papers shown
Title
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
Max W. Y. Lam
Yijin Xing
Weiya You
Jingcheng Wu
Zongyu Yin
...
T. Zhao
Chien-Hung Liu
Xuchen Song
Yang Li
Yahui Zhou
LRM
56
2
0
25 Mar 2025
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Dahyun Jung
Seungyoon Lee
Hyeonseok Moon
Chanjun Park
Heuiseok Lim
AAML
ALM
ELM
50
0
0
25 Mar 2025
Language Model Uncertainty Quantification with Attention Chain
Yinghao Li
Rushi Qiang
Lama Moukheiber
Chao Zhang
LRM
41
0
0
24 Mar 2025
A Survey of Large Language Model Agents for Question Answering
Murong Yue
LLMAG
LM&MA
ELM
55
0
0
24 Mar 2025
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
Codefuse
Ling Team
Wenting Cai
Yuchen Cao
C. Chen
...
Wei Zhang
Z. Zhang
Hailin Zhao
Xunjin Zheng
Jun Zhou
ALM
MoE
49
0
0
22 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
40
0
0
22 Mar 2025
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
Jian Zhang
Z. Wang
Haiping Zhu
Jun Liu
Qika Lin
Erik Cambria
LLMAG
78
0
0
21 Mar 2025
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
Jinyi Liu
Yan Zheng
Rong Cheng
Qiyu Wu
Wei Guo
...
Hebin Liang
Yifu Yuan
Hangyu Mao
Fuzheng Zhang
Jianye Hao
LRM
AI4CE
44
1
0
20 Mar 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at
ResearchTrend Connect | LLMAG
on
07 May 2025
93
5
0
20 Mar 2025
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
Yunzhi Yao
Jizhan Fang
Jia-Chen Gu
N. Zhang
Shumin Deng
H. Chen
Nanyun Peng
KELM
54
1
0
20 Mar 2025
COPA: Comparing the Incomparable to Explore the Pareto Front
Adrián Javaloy
Antonio Vergari
Isabel Valera
52
0
0
18 Mar 2025
Command R7B Arabic: A Small, Enterprise Focused, Multilingual, and Culturally Aware Arabic LLM
Yazeed Alnumay
Alexandre Barbet
Anna Bialas
William Darling
Shaan Desai
...
Stephanie Howe
Olivia Lasche
Justin Lee
Anirudh Shrinivason
Jennifer Tracey
82
0
0
18 Mar 2025
Pensez: Less Data, Better Reasoning -- Rethinking French LLM
Huy Hoang Ha
ReLM
LRM
55
1
0
17 Mar 2025
DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective
Dengyun Peng
Yuhang Zhou
Qiguang Chen
Jinhao Liu
Jingjing Chen
L. Qin
47
0
0
17 Mar 2025
Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences
Kedi Chen
Zhikai Lei
Fan Zhang
Yinqi Zhang
Qin Chen
Jie Zhou
Liang He
Qipeng Guo
K. Chen
Wei-na Zhang
ELM
LRM
49
0
0
17 Mar 2025
VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures
Yoo Yeon Sung
H. Kim
Dan Zhang
55
1
0
16 Mar 2025
A Survey on Federated Fine-tuning of Large Language Models
Yebo Wu
Chunlin Tian
Jingguang Li
He Sun
Kahou Tam
Li Li
Chengzhong Xu
FedML
78
0
0
15 Mar 2025
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity
Jing Bi
Junjia Guo
Susan Liang
Guangyu Sun
Luchuan Song
...
Jinxi He
Jiarui Wu
A. Vosoughi
C. L. P. Chen
Chenliang Xu
LRM
62
1
0
14 Mar 2025
"Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding
Hyunbin Jin
Je Won Yeom
Seunghyun Bae
Taesup Kim
LRM
ReLM
37
1
0
13 Mar 2025
Token Weighting for Long-Range Language Modeling
Falko Helm
Nico Daheim
Iryna Gurevych
52
1
0
12 Mar 2025
Reinforcement Learning is all You Need
Yongsheng Lian
ReLM
OffRL
LRM
65
0
0
12 Mar 2025
Position-Aware Depth Decay Decoding (
D
3
D^3
D
3
): Boosting Large Language Model Inference Efficiency
Siqi Fan
Xuezhi Fang
Xingrun Xing
Peng Han
Shuo Shang
Yequan Wang
46
0
0
11 Mar 2025
Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
Yongqiang Yao
Jingru Tan
Kaihuan Liang
Feizhao Zhang
Yazhe Niu
Jiahao Hu
Ruihao Gong
Dahua Lin
Ningyi Xu
52
0
0
10 Mar 2025
AI-driven control of bioelectric signalling for real-time topological reorganization of cells
Gonçalo Hora de Carvalho
AI4CE
38
0
0
10 Mar 2025
Mitigating Memorization in LLMs using Activation Steering
Manan Suri
Nishit Anand
Amisha Bhaskar
LLMSV
45
2
0
08 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
Liang Lin
LRM
52
1
0
08 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
C. Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
100
2
0
07 Mar 2025
Enhancing Reasoning with Collaboration and Memory
Julie Michelman
Nasrin Baratalipour
Matthew Abueg
LLMAG
FedML
59
1
0
07 Mar 2025
MastermindEval: A Simple But Scalable Reasoning Benchmark
Jonas Golde
Patrick Haller
Fabio Barth
Alan Akbik
LRM
ReLM
ELM
46
1
0
07 Mar 2025
Evaluating open-source Large Language Models for automated fact-checking
Nicoló Fontana
Francesco Corso
Enrico Zuccolotto
Francesco Pierri
HILM
47
0
0
07 Mar 2025
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination
Simin Chen
Pranav Pusarla
Baishakhi Ray
63
0
0
06 Mar 2025
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference
Tianyu Cui
Song-Jun Xu
Artem Moskalev
Shuwei Li
Tommaso Mansi
Mangal Prakash
Rui Liao
BDL
58
1
0
06 Mar 2025
Preserving Cultural Identity with Context-Aware Translation Through Multi-Agent AI Systems
Mahfuz Ahmed Anik
Abdur Rahman
Azmine Toushik Wasi
Md Manjurul Ahsan
47
1
0
05 Mar 2025
Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm
Z. Li
Yuhao Du
Xiaoqi Jiao
Yiwen Guo
Yuege Feng
Xiang Wan
Anningzhe Gao
Jinpeng Hu
53
0
0
04 Mar 2025
Large-Scale Data Selection for Instruction Tuning
Hamish Ivison
Muru Zhang
Faeze Brahman
Pang Wei Koh
Pradeep Dasigi
ALM
65
1
0
03 Mar 2025
Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers
Rin Ashizawa
Yoichi Hirose
Nozomu Yoshinari
Kento Uchida
Shinichi Shirakawa
53
0
0
03 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
68
18
0
03 Mar 2025
Toward Stable and Consistent Evaluation Results: A New Methodology for Base Model Evaluation
Hongzhi Luan
Changxin Tian
Zhaoxin Huan
Xiaolu Zhang
Kunlong Chen
Zhiqiang Zhang
Jun Zhou
37
1
0
02 Mar 2025
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Kashun Shum
Y. Huang
Hongjian Zou
Qi Ding
Yixuan Liao
X. Chen
Qian Liu
Junxian He
47
2
0
02 Mar 2025
Evaluating Polish linguistic and cultural competency in large language models
Sławomir Dadas
Małgorzata Grębowiec
Michał Perełkiewicz
Rafał Poświata
ELM
39
1
0
02 Mar 2025
Efficiently Editing Mixture-of-Experts Models with Compressed Experts
Y. He
Yang Liu
Chen Liang
Hany Awadalla
MoE
47
1
0
01 Mar 2025
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
115
4
0
26 Feb 2025
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Taishi Nakamura
Takuya Akiba
Kazuki Fujii
Yusuke Oda
Rio Yokota
Jun Suzuki
MoMe
MoE
67
1
0
26 Feb 2025
Automatic Prompt Optimization via Heuristic Search: A Survey
Wendi Cui
Jiaxin Zhang
Z. Li
Hao Sun
Damien Lopez
Kamalika Das
Bradley Malin
Sricharan Kumar
32
1
0
26 Feb 2025
CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging
Zongzhen Yang
Binhang Qi
Hailong Sun
Wenrui Long
Ruobing Zhao
Xiang Gao
MoMe
48
0
0
26 Feb 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang
Tianyang Liu
Daoan Zhang
Antoine Simoulin
Xiaoyi Liu
...
Zhaopu Teng
Xin Qian
Grey Yang
Jiebo Luo
Julian McAuley
ReLM
OffRL
LRM
81
3
0
26 Feb 2025
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Jianhao Yan
Yun Luo
Yue Zhang
LLMAG
50
1
0
25 Feb 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
80
1
0
25 Feb 2025
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
77
3
0
24 Feb 2025
LightThinker: Thinking Step-by-Step Compression
Jintian Zhang
Yuqi Zhu
Mengshu Sun
Yujie Luo
Shuofei Qiao
Lun Du
Da Zheng
H. Chen
N. Zhang
LRM
LLMAG
44
10
0
24 Feb 2025
Previous
1
2
3
4
5
...
14
15
16
Next