ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09261
  4. Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
    ALM
    ELM
    LRM
    ReLM
ArXivPDFHTML

Papers citing "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"

50 / 788 papers shown
Title
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Chengyin Xu
Kaiyuan Chen
Xiao Li
Ke Shen
Chenggang Li
OffRL
41
0
0
24 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
69
0
0
24 Feb 2025
Selective Prompt Anchoring for Code Generation
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
77
3
0
24 Feb 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
H. Li
Tao Xie
Baishakhi Ray
33
2
0
23 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
47
1
0
22 Feb 2025
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Z. Chen
Mingxiao Li
Shangsong Liang
Z. Ren
V. Honavar
90
5
0
21 Feb 2025
Forecasting Frontier Language Model Agent Capabilities
Forecasting Frontier Language Model Agent Capabilities
Govind Pimpale
Axel Højmark
Jérémy Scheurer
Marius Hobbhahn
LLMAG
ELM
41
1
0
21 Feb 2025
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
Zican Dong
Junyi Li
Jinhao Jiang
Mingyu Xu
Wayne Xin Zhao
B. Wang
Weipeng Chen
VLM
89
0
0
20 Feb 2025
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Z. Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
35
1
0
20 Feb 2025
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Yingshui Tan
Yilei Jiang
Y. Li
J. Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
70
0
0
17 Feb 2025
System Message Generation for User Preferences using Open-Source Models
System Message Generation for User Preferences using Open-Source Models
Minbyul Jeong
Jungho Cho
Minsoo Khang
Dawoon Jung
Teakgyu Hong
36
0
0
17 Feb 2025
Improve LLM-as-a-Judge Ability as a General Ability
Improve LLM-as-a-Judge Ability as a General Ability
Jiachen Yu
Shaoning Sun
Xiaohui Hu
Jiaxu Yan
Kaidong Yu
Xuelong Li
ELM
75
2
0
17 Feb 2025
Atom of Thoughts for Markov LLM Test-Time Scaling
Atom of Thoughts for Markov LLM Test-Time Scaling
Fengwei Teng
Zhaoyang Yu
Quan Shi
Jiayi Zhang
Chenglin Wu
Yuyu Luo
MU
LRM
49
13
0
17 Feb 2025
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
Jafar Isbarov
Arofat Akhundjanova
Mammad Hajili
Kavsar Huseynova
Dmitry Gaynullin
...
Amina Alisheva
Aizirek Turdubaeva
Abdullatif Köksal
Samir Rustamov
Duygu Ataman
ELM
35
0
0
16 Feb 2025
Large Language Diffusion Models
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Ji-Rong Wen
Chongxuan Li
84
12
0
14 Feb 2025
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Anna Arias-Duart
Pablo A. Martin-Torres
Daniel Hinjos
Pablo Bernabeu Perez
Lucia Urcelay-Ganzabal
Marta Gonzalez-Mallo
Ashwin Kumar Gururajan
Enrique Lopez-Cuena
Sergio Álvarez Napagao
Dario Garcia-Gasulla
LM&MA
ELM
100
1
0
10 Feb 2025
FRAMES: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy
FRAMES: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy
Xuemiao Zhang
Feiyu Duan
Liangyu Xu
Yongwei Zhou
Sirui Wang
Rongxiang Weng
J. Wang
Xunliang Cai
55
0
0
08 Feb 2025
Self-Supervised Prompt Optimization
Self-Supervised Prompt Optimization
Jinyu Xiang
Jiayi Zhang
Zhaoyang Yu
Fengwei Teng
Jinhao Tu
Xinbing Liang
Sirui Hong
Chenglin Wu
Yuyu Luo
OffRL
LRM
57
5
0
07 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
113
3
0
06 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
70
3
0
04 Feb 2025
QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning
QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning
Moses Ananta
Muhammad Farid Adilazuarda
Zayd Muhammad Kawakibi Zuhri
Ayu Purwarianti
Alham Fikri Aji
MQ
47
0
0
03 Feb 2025
Ensembles of Low-Rank Expert Adapters
Ensembles of Low-Rank Expert Adapters
Yinghao Li
Vianne Gao
Chao Zhang
MohamadAli Torkamani
55
0
0
31 Jan 2025
TableMaster: A Recipe to Advance Table Understanding with Language Models
TableMaster: A Recipe to Advance Table Understanding with Language Models
Lang Cao
Hanbing Liu
LMTD
RALM
117
0
1
31 Jan 2025
LCTG Bench: LLM Controlled Text Generation Benchmark
K. K.
Masato Mita
Peinan Zhang
S. Sasaki
Ryosuke Ishigami
Naoaki Okazaki
55
0
0
28 Jan 2025
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Xinyu Tang
Xiaolei Wang
Wayne Xin Zhao
Siyuan Lu
Yaliang Li
Ji-Rong Wen
LRM
41
13
0
28 Jan 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin
Zhaoran Wang
LRM
ReLM
43
0
0
27 Jan 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
45
5
0
21 Jan 2025
Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
Qirun Dai
Dylan Zhang
Jiaqi W. Ma
Hao Peng
TDI
46
1
0
21 Jan 2025
TAPO: Task-Referenced Adaptation for Prompt Optimization
TAPO: Task-Referenced Adaptation for Prompt Optimization
Wenxin Luo
W. Wang
Xiaopeng Li
Weibo Zhou
Pengyue Jia
Xiangyu Zhao
45
0
0
12 Jan 2025
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Lester James Validad Miranda
Yizhong Wang
Yanai Elazar
Sachin Kumar
Valentina Pyatkin
Faeze Brahman
Noah A. Smith
Hannaneh Hajishirzi
Pradeep Dasigi
45
8
0
08 Jan 2025
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
31
0
0
31 Dec 2024
A Silver Bullet or a Compromise for Full Attention? A Comprehensive
  Study of Gist Token-based Context Compression
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression
Chenlong Deng
Zhisong Zhang
Kelong Mao
Shuaiyi Li
Xinting Huang
Dong Yu
Zhicheng Dou
36
1
0
23 Dec 2024
Boosting LLM via Learning from Data Iteratively and Selectively
Boosting LLM via Learning from Data Iteratively and Selectively
Qi Jia
Siyu Ren
Ziheng Qin
Fuzhao Xue
Jinjie Ni
Yang You
19
0
0
23 Dec 2024
Revisiting In-Context Learning with Long Context Language Models
Revisiting In-Context Learning with Long Context Language Models
Jinheon Baek
Sun Jae Lee
Prakhar Gupta
Geunseob
Oh
Siddharth Dalmia
84
0
0
22 Dec 2024
NILE: Internal Consistency Alignment in Large Language Models
NILE: Internal Consistency Alignment in Large Language Models
Minda Hu
Qiyuan Zhang
Yufei Wang
Bowei He
Hongru Wang
Jingyan Zhou
Liangyou Li
Yasheng Wang
Chen-li Ma
Irwin King
81
0
0
21 Dec 2024
Formal Mathematical Reasoning: A New Frontier in AI
Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang
Gabriel Poesia
Jingxuan He
Wenda Li
Kristin Lauter
Swarat Chaudhuri
Dawn Song
LRM
AI4CE
82
20
0
20 Dec 2024
Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative
  Querying
Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying
Federico Castagna
I. Sassoon
Simon Parsons
LRM
85
0
0
19 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
78
6
0
17 Dec 2024
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Siyuan Wang
Dianyi Wang
Chengxing Zhou
Zejun Li
Zhihao Fan
Xuanjing Huang
Zhongyu Wei
VLM
82
0
0
17 Dec 2024
C3oT: Generating Shorter Chain-of-Thought without Compromising
  Effectiveness
C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness
Yu Kang
Xianghui Sun
Liangyu Chen
Wei Zou
LRM
67
18
0
16 Dec 2024
Codenames as a Benchmark for Large Language Models
Codenames as a Benchmark for Large Language Models
Matthew Stephenson
Matthew Sidji
Benoît Ronval
LLMAG
LRM
ELM
95
1
0
16 Dec 2024
Predictable Emergent Abilities of LLMs: Proxy Tasks Are All You Need
Predictable Emergent Abilities of LLMs: Proxy Tasks Are All You Need
Bo Zhang
Yan Yan
Boxiang Yang
Yifei Xue
Guang Liu
LRM
71
0
0
10 Dec 2024
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Felipe Maia Polo
S. Kamath S
Leshem Choshen
Yuekai Sun
Mikhail Yurochkin
76
5
0
09 Dec 2024
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning
Mathurin Videau
Alessandro Leite
Marc Schoenauer
O. Teytaud
ReLM
LRM
66
0
0
05 Dec 2024
Bench-CoE: a Framework for Collaboration of Experts from Benchmark
Bench-CoE: a Framework for Collaboration of Experts from Benchmark
Yuanshuai Wang
Xingjian Zhang
Jinkun Zhao
Siwei Wen
Peilin Feng
Shuhao Liao
Lei Huang
Wenjun Wu
MoE
ALM
78
2
0
05 Dec 2024
Can We Afford The Perfect Prompt? Balancing Cost and Accuracy with the
  Economical Prompting Index
Can We Afford The Perfect Prompt? Balancing Cost and Accuracy with the Economical Prompting Index
Tyler McDonald
Anthony Colosimo
Yifeng Li
Ali Emami
65
1
0
02 Dec 2024
INTELLECT-1 Technical Report
INTELLECT-1 Technical Report
Sami Jaghouar
Jack Min Ong
Manveer Basra
Fares Obeid
Jannik Straube
...
Lucas Atkins
Maziyar Panahi
Charles Goddard
Max Ryabinin
Johannes Hagemann
MoE
62
1
0
02 Dec 2024
AI Benchmarks and Datasets for LLM Evaluation
AI Benchmarks and Datasets for LLM Evaluation
Todor Ivanov
Valeri Penchev
96
0
0
02 Dec 2024
TAROT: Targeted Data Selection via Optimal Transport
TAROT: Targeted Data Selection via Optimal Transport
Lan Feng
Fan Nie
Yuejiang Liu
Alexandre Alahi
OT
121
0
0
30 Nov 2024
Predicting Emergent Capabilities by Finetuning
Predicting Emergent Capabilities by Finetuning
Charlie Snell
Eric Wallace
Dan Klein
Sergey Levine
ELM
LRM
75
5
0
25 Nov 2024
Previous
123456...141516
Next