ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09261
  4. Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
    ALM
    ELM
    LRM
    ReLM
ArXivPDFHTML

Papers citing "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"

50 / 788 papers shown
Title
Separate the Wheat from the Chaff: Model Deficiency Unlearning via
  Parameter-Efficient Module Operation
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu
Dongfang Li
Baotian Hu
Zihao Zheng
Zhenyu Liu
M. Zhang
KELM
MU
18
26
0
16 Aug 2023
CausalLM is not optimal for in-context learning
CausalLM is not optimal for in-context learning
Nan Ding
Tomer Levinboim
Jialin Wu
Sebastian Goodman
Radu Soricut
8
23
0
14 Aug 2023
In-Context Alignment: Chat with Vanilla Language Models Before
  Fine-Tuning
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning
Xiaochuang Han
17
19
0
08 Aug 2023
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Binfeng Xu
Xukun Liu
Hua Shen
Zeyu Han
Yuhan Li
Murong Yue
Zhi-Ping Peng
Yuchen Liu
Ziyu Yao
Dongkuan Xu
LLMAG
14
18
0
08 Aug 2023
Simple synthetic data reduces sycophancy in large language models
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
20
65
0
07 Aug 2023
Automatically Correcting Large Language Models: Surveying the landscape
  of diverse self-correction strategies
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Liangming Pan
Michael Stephen Saxon
Wenda Xu
Deepak Nathani
Xinyi Wang
William Yang Wang
KELM
LRM
18
200
0
06 Aug 2023
Do LLMs Possess a Personality? Making the MBTI Test an Amazing
  Evaluation for Large Language Models
Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models
Keyu Pan
Yawen Zeng
LLMAG
13
41
0
30 Jul 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&Ro
LLMAG
17
193
0
24 Jul 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language
  Models
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELM
ALM
30
132
0
20 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill
  Sets
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
19
96
0
20 Jul 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities
  of Large Language Models
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Yizhou Sun
Wei Wang
ELM
LRM
15
42
0
20 Jul 2023
Multi-Method Self-Training: Improving Code Generation With Text, And
  Vice Versa
Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa
Shriyash Upadhyay
Etan Ginsberg
SyDa
LRM
11
0
0
20 Jul 2023
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in
  Language Model Prompting
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting
Rylan Schaeffer
Kateryna Pistunova
Samarth Khanna
Sarthak Consul
Oluwasanmi Koyejo
ReLM
LRM
23
6
0
20 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
52
10,890
0
18 Jul 2023
Large Language Models Perform Diagnostic Reasoning
Large Language Models Perform Diagnostic Reasoning
Cheng-Kuang Wu
Wei-Lin Chen
Hsin-Hsi Chen
ReLM
ELM
LM&MA
LRM
16
8
0
18 Jul 2023
AlpaGasus: Training A Better Alpaca with Fewer Data
AlpaGasus: Training A Better Alpaca with Fewer Data
Lichang Chen
Shiyang Li
Jun Yan
Hai Wang
Kalpa Gunaratna
...
Zheng Tang
Vijay Srinivasan
Tianyi Zhou
Heng-Chiao Huang
Hongxia Jin
ALM
39
0
0
17 Jul 2023
Do Emergent Abilities Exist in Quantized Large Language Models: An
  Empirical Study
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study
Peiyu Liu
Zikang Liu
Ze-Feng Gao
Dawei Gao
Wayne Xin Zhao
Yaliang Li
Bolin Ding
Ji-Rong Wen
MQ
LRM
25
31
0
16 Jul 2023
Large Language Models Understand and Can be Enhanced by Emotional
  Stimuli
Large Language Models Understand and Can be Enhanced by Emotional Stimuli
Cheng-rong Li
Jindong Wang
Yixuan Zhang
Kaijie Zhu
Wenxin Hou
Jianxun Lian
Fang Luo
Qiang Yang
Xingxu Xie
LRM
64
116
0
14 Jul 2023
Large Language Models as General Pattern Machines
Large Language Models as General Pattern Machines
Suvir Mirchandani
F. Xia
Peter R. Florence
Brian Ichter
Danny Driess
Montse Gonzalez Arenas
Kanishka Rao
Dorsa Sadigh
Andy Zeng
LLMAG
34
183
0
10 Jul 2023
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN
  Fine-Tuning
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Deepanway Ghosal
Yew Ken Chia
Navonil Majumder
Soujanya Poria
ALM
LRM
22
17
0
05 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model
  Planners
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
31
219
0
04 Jul 2023
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think"
  Step-by-Step
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step
Liunian Harold Li
Jack Hessel
Youngjae Yu
Xiang Ren
Kai-Wei Chang
Yejin Choi
LRM
AI4CE
ReLM
20
80
0
24 Jun 2023
Joint Prompt Optimization of Stacked LLMs using Variational Inference
Joint Prompt Optimization of Stacked LLMs using Variational Inference
Alessandro Sordoni
Xingdi Yuan
Marc-Alexandre Côté
Matheus Pereira
Adam Trischler
Ziang Xiao
Arian Hosseini
Friederike Niedtner
Nicolas Le Roux
12
16
0
21 Jun 2023
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Jifan Yu
Xiaozhi Wang
Shangqing Tu
S. Cao
Daniel Zhang-Li
...
Lei Hou
Zhiyuan Liu
Bin Xu
Jie Tang
Juanzi Li
ELM
ALM
20
42
0
15 Jun 2023
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge
  Evaluation
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu
Xiaoxuan Zhu
Haoning Ye
Lin Zhang
Jianchen Wang
...
Zili Wang
Shusen Wang
Weiguo Zheng
Hongwei Feng
Yanghua Xiao
ALM
ELM
17
55
0
09 Jun 2023
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large
  Language Models
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
Yew Ken Chia
Pengfei Hong
Lidong Bing
Soujanya Poria
ELM
17
60
0
07 Jun 2023
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open
  Resources
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Yizhong Wang
Hamish Ivison
Pradeep Dasigi
Jack Hessel
Tushar Khot
...
David Wadden
Kelsey MacMillan
Noah A. Smith
Iz Beltagy
Hannaneh Hajishirzi
ALM
ELM
11
364
0
07 Jun 2023
Certified Deductive Reasoning with Language Models
Certified Deductive Reasoning with Language Models
Gabriel Poesia
Kanishk Gandhi
E. Zelikman
Noah D. Goodman
ELM
ReLM
LRM
19
0
0
06 Jun 2023
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Subhabrata Mukherjee
Arindam Mitra
Ganesh Jawahar
Sahaj Agarwal
Hamid Palangi
Ahmed Hassan Awadallah
ELM
ALM
LRM
22
257
0
05 Jun 2023
Multilingual Conceptual Coverage in Text-to-Image Models
Multilingual Conceptual Coverage in Text-to-Image Models
Michael Stephen Saxon
William Yang Wang
EGVM
19
8
0
02 Jun 2023
Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation
Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation
Adithya V Ganesan
Yash Kumar Lal
August Håkan Nilsson
H. A. Schwartz
19
22
0
01 Jun 2023
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
Shalev Lifshitz
Keiran Paster
Harris Chan
Jimmy Ba
Sheila A. McIlraith
LM&Ro
11
66
0
01 Jun 2023
Grammar Prompting for Domain-Specific Language Generation with Large
  Language Models
Grammar Prompting for Domain-Specific Language Generation with Large Language Models
Bailin Wang
Zi Wang
Xuezhi Wang
Yuan Cao
Rif A. Saurous
Yoon Kim
ReLM
LRM
17
21
0
30 May 2023
Strategic Reasoning with Language Models
Strategic Reasoning with Language Models
Kanishk Gandhi
Dorsa Sadigh
Noah D. Goodman
LM&Ro
LRM
27
34
0
30 May 2023
Encouraging Divergent Thinking in Large Language Models through
  Multi-Agent Debate
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Tian Liang
Zhiwei He
Wenxiang Jiao
Xing Wang
Rui Wang
Yujiu Yang
Zhaopeng Tu
Shuming Shi
LLMAG
LRM
17
390
0
30 May 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
  Datasets
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq R. Joty
J. Huang
LM&MA
ELM
ALM
31
175
0
29 May 2023
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language
  Models' Reasoning Performance
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
Yao Fu
Litu Ou
Mingyu Chen
Yuhao Wan
Hao-Chun Peng
Tushar Khot
LLMAG
ELM
LRM
ReLM
19
108
0
26 May 2023
Large Language Models as Tool Makers
Large Language Models as Tool Makers
Tianle Cai
Xuezhi Wang
Tengyu Ma
Xinyun Chen
Denny Zhou
LLMAG
24
182
0
26 May 2023
Passive learning of active causal strategies in agents and language
  models
Passive learning of active causal strategies in agents and language models
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Ishita Dasgupta
A. Nam
Jane X. Wang
14
15
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
17
210
0
24 May 2023
Self-ICL: Zero-Shot In-Context Learning with Self-Generated
  Demonstrations
Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations
Wei-Lin Chen
Cheng-Kuang Wu
Yun-Nung Chen
Hsin-Hsi Chen
8
27
0
24 May 2023
How Predictable Are Large Language Model Capabilities? A Case Study on
  BIG-bench
How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench
Qinyuan Ye
Harvey Yiyun Fu
Xiang Ren
Robin Jia
ELM
11
20
0
24 May 2023
Universal Self-Adaptive Prompting
Universal Self-Adaptive Prompting
Xingchen Wan
Ruoxi Sun
Hootan Nakhost
H. Dai
Julian Martin Eisenschlos
Sercan Ö. Arik
Tomas Pfister
LRM
23
9
0
24 May 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
  Large Language Models
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
17
53
0
24 May 2023
Do prompt positions really matter?
Do prompt positions really matter?
Junyu Mao
Stuart E. Middleton
Mahesan Niranjan
VLM
19
3
0
23 May 2023
Navigating Prompt Complexity for Zero-Shot Classification: A Study of
  Large Language Models in Computational Social Science
Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science
Yida Mu
Benze Wu
William Thorne
Ambrose Robinson
Nikolaos Aletras
Carolina Scarton
Kalina Bontcheva
Xingyi Song
11
17
0
23 May 2023
VIP5: Towards Multimodal Foundation Models for Recommendation
VIP5: Towards Multimodal Foundation Models for Recommendation
Shijie Geng
Juntao Tan
Shuchang Liu
Zuohui Fu
Yongfeng Zhang
6
69
0
23 May 2023
Dr.ICL: Demonstration-Retrieved In-context Learning
Dr.ICL: Demonstration-Retrieved In-context Learning
Man Luo
Xin Xu
Zhuyun Dai
Panupong Pasupat
Mehran Kazemi
Chitta Baral
Vaiva Imbrasaite
Vincent Zhao
RALM
11
48
0
23 May 2023
The CoT Collection: Improving Zero-shot and Few-shot Learning of
  Language Models via Chain-of-Thought Fine-Tuning
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Seungone Kim
Se June Joo
Doyoung Kim
Joel Jang
Seonghyeon Ye
Jamin Shin
Minjoon Seo
ALM
RALM
LRM
8
55
0
23 May 2023
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better
  than Chain-of-thought Fine-tuning
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning
Xuekai Zhu
Biqing Qi
Kaiyan Zhang
Xingwei Long
Zhouhan Lin
Bowen Zhou
ALM
LRM
20
18
0
23 May 2023
Previous
123...13141516
Next