Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2210.09261
Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"
50 / 1,089 papers shown
Title
Finance Language Model Evaluation (FLaME)
Glenn Matlin
Mika Okamoto
Huzaifa Pardawala
Yang Yang
Sudheer Chava
AIFin
LRM
170
1
0
18 Jun 2025
GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors
Hengyuan Zhang
Xinrong Chen
Yingmin Qiu
Xiao Liang
Ziyue Li
Guanyu Wang
Weiping Li
Tong Mo
Wenyue Li
Hayden Kwok-Hay So
MoE
ALM
165
2
0
17 Jun 2025
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
Shen Yuan
Yin Zheng
Taifeng Wang
Binbin Liu
Hongteng Xu
MoMe
284
1
0
17 Jun 2025
BOW: Reinforcement Learning for Bottlenecked Next Word Prediction
Ming shen
Zhikun Xu
Xiao Ye
Jacob Dineen
Ben Zhou
OffRL
LRM
184
0
0
16 Jun 2025
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
Badr AlKhamissi
C. Nicolò De Sabbata
Greta Tuckute
Zeming Chen
Martin Schrimpf
Antoine Bosselut
MoE
LRM
185
3
0
16 Jun 2025
Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
Haonan Wang
Brian K Chen
Siquan Li
Xinhe Liang
Hwee Kuan Lee
Kenji Kawaguchi
Tianyang Hu
157
0
0
16 Jun 2025
Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Qiming Ge
Shuhao Xing
Songyang Gao
Yunhua Zhou
Yicheng Zou
...
Zhi Chen
Hang Yan
Qi Zhang
Q. Guo
Kai Chen
170
0
0
16 Jun 2025
GTA: Grouped-head latenT Attention
Luoyang Sun
Cheng Deng
Jiwen Jiang
Xinjian Wu
Haifeng Zhang
Lei Chen
Lionel M. Ni
Ning Yang
145
1
0
15 Jun 2025
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models
Xingjian Diao
Chunhui Zhang
Keyi Kong
Weiyi Wu
Chiyu Ma
Z. Ouyang
Peijun Qing
Soroush Vosoughi
Jiang Gui
AuLLM
OffRL
ReLM
LRM
167
8
0
15 Jun 2025
Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models
Kaiyuan Liu
Chen Shen
Zhanwei Zhang
Junjie Liu
Xiaosong Yuan
Jieping Ye
ReLM
LRM
203
8
0
14 Jun 2025
Bhatt Conjectures: On Necessary-But-Not-Sufficient Benchmark Tautology for Human Like Reasoning
Manish Bhatt
LRM
164
0
0
13 Jun 2025
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis
Yuan Gao
Mattia Piccinini
Yuchen Zhang
Dingrui Wang
Korbinian Moller
...
Steven Peters
Andrea Stocco
Bassam Alrifaee
Marco Pavone
Johannes Betz
191
17
0
13 Jun 2025
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Hao Xu
Hamish Ivison
Yejin Choi
Noah A. Smith
Hannaneh Hajishirzi
227
2
0
13 Jun 2025
Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
Jikai Jin
Vasilis Syrgkanis
Sham Kakade
Hanlin Zhang
ELM
295
0
0
12 Jun 2025
Code Execution as Grounded Supervision for LLM Reasoning
Dongwon Jung
Wenxuan Zhou
Muhao Chen
OffRL
LRM
281
2
0
12 Jun 2025
BF-Max: an Efficient Bit Flipping Decoder with Predictable Decoding Failure Rate
International Symposium on Information Theory (ISIT), 2025
Alessio Baldelli
Marco Baldi
F. Chiaraluce
Paolo Santini
304
2
0
11 Jun 2025
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
Prakamya Mishra
Jiang-Long Liu
Jialian Wu
Xiaodong Yu
Zicheng Liu
Emad Barsoum
LRM
200
1
0
11 Jun 2025
RePO: Replay-Enhanced Policy Optimization
Siheng Li
Zhanhui Zhou
W. Lam
Chao Yang
Chaochao Lu
OffRL
262
9
0
11 Jun 2025
LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge
Songze Li
Chuokun Xu
Jiaying Wang
Xueluan Gong
Chen Chen
J. Zhang
Jun Wang
K. Lam
R. Beyah
AAML
ELM
301
6
0
11 Jun 2025
LLM-as-a-qualitative-judge: automating error analysis in natural language generation
Nadezhda Chirkova
Tunde Oluwaseyi Ajayi
Seth Aycock
Zain Muhammad Mujahid
Vladana Perlić
Ekaterina Borisova
Markarit Vartampetian
ELM
222
0
0
10 Jun 2025
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search
Dongge Han
Menglin Xia
Daniel Madrigal Diaz
Samuel Kessler
Ankur Mallick
Xuchao Zhang
Mirian Hipolito Garcia
Jin Xu
Victor Rühle
Saravan Rajmohan
LRM
163
0
0
10 Jun 2025
Transforming Expert Knowledge into Scalable Ontology via Large Language Models
Ikkei Itoku
David Theil
Evelyn Eichelsdoerfer Uehara
S. Bhaduri
Junnosuke Kuroda
Toshi Yumoto
Alex Gil
Natalie Perez
Rajesh Kumar Cherukuri
Naumaan Nayyar
218
0
0
10 Jun 2025
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Mickel Liu
L. Jiang
Yancheng Liang
S. Du
Yejin Choi
Tim Althoff
Natasha Jaques
AAML
LRM
215
12
0
09 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
259
19
0
09 Jun 2025
Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models
Samir Abdaljalil
Hasan Kurban
K. Qaraqe
E. Serpedin
LM&Ro
LRM
139
2
0
08 Jun 2025
United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory
HaoYang Shang
Xuan Liu
Zi Liang
J. Zhang
Haibo Hu
Song Guo
LLMAG
174
4
0
07 Jun 2025
Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hongming Yang
Shi Lin
Jun Shao
Changting Lin
Donghai Zhu
Meng Han
Qinglei Kong
147
2
0
06 Jun 2025
dots.llm1 Technical Report
Bi Huo
Bin Tu
Cheng Qin
Da Zheng
Debing Zhang
...
Yuqiu Ji
Ze Wen
Zhenhai Liu
Zichao Li
Zilong Liao
MoE
171
3
0
06 Jun 2025
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
Rongzhe Wei
Peizhi Niu
Hans Hao-Hsun Hsu
Ruihan Wu
Haoteng Yin
...
Vamsi K. Potluru
Eli Chien
Kamalika Chaudhuri
S. Rasoul Etesami
P. Li
MU
KELM
443
6
0
06 Jun 2025
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jun Rao
Zepeng Lin
Xuebo Liu
Xiaopeng Ke
Lian Lian
Dong Jin
Shengjun Cheng
Jun Yu
Min Zhang
189
6
0
04 Jun 2025
Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Junqi Gao
Zhichang Guo
Dazhi Zhang
Dong Li
Runze Liu
Pengfei Li
Kai Tian
Biqing Qi
330
0
0
04 Jun 2025
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Jiajun Sun
Ming Zhang
Chenhao Huang
Jiayi Chen
F. Chen
...
Wei Chengzhi
Lin Yan
Qi Zhang
Qi Zhang
Xuanjing Huang
ELM
262
2
0
03 Jun 2025
PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs
Ze Yu Zhang
Bolin Ding
Bryan Kian Hsiang Low
MoE
283
0
0
03 Jun 2025
Adaptive Task Vectors for Large Language Models
Joonseong Kang
Soojeong Lee
Subeen Park
Sumin Park
Taero Kim
Jihee Kim
Ryunyi Lee
Kyungwoo Song
210
0
0
03 Jun 2025
Data Pruning by Information Maximization
International Conference on Learning Representations (ICLR), 2025
Haoru Tan
Sitong Wu
Wei Huang
Shizhen Zhao
Xiaojuan Qi
295
7
0
02 Jun 2025
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
Yanjun Fu
Faisal Hamman
Sanghamitra Dutta
ALM
263
6
0
02 Jun 2025
FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
Xinyi Wang
Lirong Gao
Haobo Wang
Yiming Zhang
Junbo Zhao
MoE
179
0
0
31 May 2025
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Zafir Stojanovski
Oliver Stanley
Joe Sharratt
Richard Jones
Abdulhakeem Adefioye
Jean Kaddour
Andreas Kopf
OffRL
LRM
329
36
0
30 May 2025
LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text
Li yunhan
Wu gengshen
AILaw
ELM
ALM
371
1
0
30 May 2025
Circuit Stability Characterizes Language Model Generalization
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Alan Sun
LRM
216
2
0
30 May 2025
Tournament of Prompts: Evolving LLM Instructions Through Structured Debates and Elo Ratings
Anirudh Nair
Adi Banerjee
Laurent Mombaerts
Matthew Hagen
Tarik Borogovac
233
2
0
30 May 2025
Semi-structured LLM Reasoners Can Be Rigorously Audited
Jixuan Leng
Cassandra A. Cohen
Zhixian Zhang
Chenyan Xiong
William W. Cohen
LRM
171
1
0
30 May 2025
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
Halil Alperen Gozeten
M. E. Ildiz
Xuechen Zhang
Hrayr Harutyunyan
A. S. Rawat
Samet Oymak
LRM
309
8
0
29 May 2025
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
S. Balasubramanian
Samyadeep Basu
Soheil Feizi
LRM
159
3
0
29 May 2025
PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Atharva Naik
Darsh Agrawal
Darsh Agrawal
Yash Mathur
Manav Kapadnis
Yuwei An
Clayton Marr
Carolyn Rose
David R. Mortensen
LRM
ELM
161
0
0
29 May 2025
Domain-Aware Tensor Network Structure Search
Giorgos Iacovides
Wuyang Zhou
Chao Li
Qibin Zhao
Danilo Mandic
172
1
0
29 May 2025
Scalable Complexity Control Facilitates Reasoning Ability of LLMs
Liangkai Hang
Junjie Yao
Zhiwei Bai
Tianyi Chen
Yang Chen
...
Feiyu Xiong
Y. Zhang
Weinan E
Hongkang Yang
Zhi-hai Xu
LRM
181
2
0
29 May 2025
Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Qingchuan Ma
Yuhang Wu
Xiawu Zheng
Rongrong Ji
162
1
0
28 May 2025
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models
Zhiyuan Li
Yi-Ju Chang
Yuan Wu
LLMAG
LRM
165
6
0
28 May 2025
LASER: Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
Paramita Mirza
Lucas Weber
Fabian Küch
224
0
0
28 May 2025
Previous
1
2
3
4
5
6
...
20
21
22
Next