ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09261
  4. Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
    ALM
    ELM
    LRM
    ReLM
ArXivPDFHTML

Papers citing "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"

50 / 788 papers shown
Title
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through
  Self-Correction in Language Models
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models
Haritz Puerto
Tilek Chubakov
Xiaodan Zhu
Harish Tayyar Madabushi
Iryna Gurevych
ReLM
LRM
39
9
1
03 Jul 2024
Cost-Effective Proxy Reward Model Construction with On-Policy and Active
  Learning
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Yifang Chen
Shuohang Wang
Ziyi Yang
Hiteshi Sharma
Nikos Karampatziakis
Donghan Yu
Kevin G. Jamieson
Simon Shaolei Du
Yelong Shen
OffRL
38
4
0
02 Jul 2024
The Art of Saying No: Contextual Noncompliance in Language Models
The Art of Saying No: Contextual Noncompliance in Language Models
Faeze Brahman
Sachin Kumar
Vidhisha Balachandran
Pradeep Dasigi
Valentina Pyatkin
...
Jack Hessel
Yulia Tsvetkov
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
62
20
0
02 Jul 2024
Deciphering the Factors Influencing the Efficacy of Chain-of-Thought:
  Probability, Memorization, and Noisy Reasoning
Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning
Akshara Prabhakar
Thomas L. Griffiths
R. Thomas McCoy
LRM
36
16
0
01 Jul 2024
Collaborative Performance Prediction for Large Language Models
Collaborative Performance Prediction for Large Language Models
Qiyuan Zhang
Fuyuan Lyu
Xue Liu
Chen Ma
23
3
0
01 Jul 2024
Exploring Advanced Large Language Models with LLMsuite
Exploring Advanced Large Language Models with LLMsuite
Giorgio Roffo
LLMAG
17
0
0
01 Jul 2024
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language
  Models by Learning from Knowledge Graphs
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs
Yifei Zhang
Xintao Wang
Jiaqing Liang
Sirui Xia
Lida Chen
Yanghua Xiao
LRM
27
1
0
30 Jun 2024
LLMs-as-Instructors: Learning from Errors Toward Automating Model
  Improvement
LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement
Jiahao Ying
Mingbao Lin
Yixin Cao
Wei Tang
Bo Wang
Qianru Sun
Xuanjing Huang
Shuicheng Yan
LRM
30
8
0
29 Jun 2024
It's Morphing Time: Unleashing the Potential of Multiple LLMs via
  Multi-objective Optimization
It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization
Bingdong Li
Zixiang Di
Yanting Yang
Hong Qian
Peng Yang
Hao Hao
Ke Tang
Aimin Zhou
MoMe
19
5
0
29 Jun 2024
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert
  Prompts
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
Ruochen Wang
Sohyun An
Minhao Cheng
Tianyi Zhou
Sung Ju Hwang
Cho-Jui Hsieh
34
7
0
28 Jun 2024
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept
  Space
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park
Maya Okawa
Andrew Lee
Ekdeep Singh Lubana
Hidenori Tanaka
50
6
0
27 Jun 2024
Aligning Teacher with Student Preferences for Tailored Training Data
  Generation
Aligning Teacher with Student Preferences for Tailored Training Data Generation
Yantao Liu
Zhao Zhang
Zijun Yao
S. Cao
Lei Hou
Juanzi Li
42
1
0
27 Jun 2024
UniGen: A Unified Framework for Textual Dataset Generation Using Large
  Language Models
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models
Siyuan Wu
Yue Huang
Chujie Gao
Dongping Chen
Qihui Zhang
...
Tianyi Zhou
Xiangliang Zhang
Jianfeng Gao
Chaowei Xiao
Lichao Sun
SyDa
31
22
0
27 Jun 2024
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White
Samuel Dooley
Manley Roberts
Arka Pal
Ben Feuer
...
W. Neiswanger
Micah Goldblum
Tom Goldstein
Willie Neiswanger
Micah Goldblum
ELM
37
6
0
27 Jun 2024
Autonomous Prompt Engineering in Large Language Models
Autonomous Prompt Engineering in Large Language Models
Daan Kepel
Konstantina Valogianni
LLMAG
32
6
0
25 Jun 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning
  Graph
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
32
7
0
25 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
47
13
0
24 Jun 2024
Trace is the New AutoDiff -- Unlocking Efficient Optimization of
  Computational Workflows
Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows
Ching-An Cheng
Allen Nie
Adith Swaminathan
AI4CE
26
1
0
23 Jun 2024
SEAM: A Stochastic Benchmark for Multi-Document Tasks
SEAM: A Stochastic Benchmark for Multi-Document Tasks
Gili Lior
Avi Caciularu
Arie Cattan
Shahar Levy
Ori Shapira
Gabriel Stanovsky
RALM
33
4
0
23 Jun 2024
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Zezhong Wang
Xingshan Zeng
Weiwen Liu
Yufei Wang
Liangyou Li
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
49
3
0
23 Jun 2024
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex
  Models
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
Xinrong Zhang
Yingfa Chen
Shengding Hu
Xu Han
Zihang Xu
Yuanwei Xu
Weilin Zhao
Maosong Sun
Zhiyuan Liu
29
9
0
22 Jun 2024
Évaluation des capacités de réponse de larges modèles de langage
  (LLM) pour des questions d'historiens
Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens
M. Chartier
Nabil Dakkoune
G. Bourgeois
Stéphane Jean
KELM
ELM
23
1
0
21 Jun 2024
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Sachit Menon
Richard Zemel
Carl Vondrick
LRM
28
1
0
20 Jun 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Ryan Cotterell
ReLM
LRM
44
12
0
20 Jun 2024
CityGPT: Empowering Urban Spatial Cognition of Large Language Models
CityGPT: Empowering Urban Spatial Cognition of Large Language Models
Jie Feng
Yuwei Du
Tianhui Liu
Siqi Guo
Yuming Lin
Yong Li
30
10
0
20 Jun 2024
CityBench: Evaluating the Capabilities of Large Language Model as World
  Model
CityBench: Evaluating the Capabilities of Large Language Model as World Model
Jie Feng
Jun Zhang
Junbo Yan
Xin Zhang
Tianjian Ouyang
Tianhui Liu
Yuwei Du
Siqi Guo
Yong Li
ELM
41
0
0
20 Jun 2024
Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines
  via Combinatorial Optimization
Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization
Mert Esencan
Tarun Advaith Kumar
A. A. Asanjan
P. A. Lott
Masoud Mohseni
Can Unlu
Davide Venturelli
A. Ho
LRM
18
0
0
19 Jun 2024
Dual-Phase Accelerated Prompt Optimization
Dual-Phase Accelerated Prompt Optimization
Muchen Yang
Moxin Li
Yongle Li
Zijun Chen
Chongming Gao
Junqi Zhang
Yangyang Li
Fuli Feng
24
0
0
19 Jun 2024
BeHonest: Benchmarking Honesty in Large Language Models
BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern
Zhulin Hu
Yuqing Yang
Ethan Chern
Yuan Guo
Jiahe Jin
Binjie Wang
Pengfei Liu
HILM
ALM
81
3
0
19 Jun 2024
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Jinhyuk Lee
Anthony Chen
Zhuyun Dai
Dheeru Dua
Devendra Singh Sachan
...
Jeremy R. Cole
Sebastian Riedel
Iftekhar Naim
Ming-Wei Chang
Kelvin Guu
RALM
LRM
43
30
0
19 Jun 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
62
473
0
18 Jun 2024
Breaking the Ceiling of the LLM Community by Treating Token Generation
  as a Classification for Ensembling
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
Yao-Ching Yu
Chun-Chih Kuo
Ziqi Ye
Yu-Cheng Chang
Yueh-Se Li
43
9
0
18 Jun 2024
Abstraction-of-Thought Makes Language Models Better Reasoners
Abstraction-of-Thought Makes Language Models Better Reasoners
Ruixin Hong
Hongming Zhang
Xiaoman Pan
Dong Yu
Changshui Zhang
LRM
41
3
0
18 Jun 2024
Self-MoE: Towards Compositional Large Language Models with
  Self-Specialized Experts
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Junmo Kang
Leonid Karlinsky
Hongyin Luo
Zhen Wang
Jacob A. Hansen
James Glass
David D. Cox
Rameswar Panda
Rogerio Feris
Alan Ritter
MoMe
MoE
34
8
0
17 Jun 2024
Prompt Design Matters for Computational Social Science Tasks but in
  Unpredictable Ways
Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways
Shubham Atreja
Joshua Ashkinaze
Lingyao Li
Julia Mendelsohn
Libby Hemphill
39
12
0
17 Jun 2024
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
  Intelligence
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
DeepSeek-AI
Qihao Zhu
Daya Guo
Zhihong Shao
Dejian Yang
...
Jiashi Li
Chenggang Zhao
Chong Ruan
Fuli Luo
Wenfeng Liang
MoE
LRM
ELM
VLM
48
149
0
17 Jun 2024
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Tong Zhu
Daize Dong
Xiaoye Qu
Jiacheng Ruan
Wenliang Chen
Yu Cheng
MoE
37
7
0
17 Jun 2024
A Survey on Human Preference Learning for Large Language Models
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
39
8
0
17 Jun 2024
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language
  Models
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Zhuoran Jin
Pengfei Cao
Chenhao Wang
Zhitao He
Hongbang Yuan
Jiachun Li
Yubo Chen
Kang Liu
Jun Zhao
KELM
MU
37
12
0
16 Jun 2024
A Comprehensive Survey of Scientific Large Language Models and Their
  Applications in Scientific Discovery
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
Yu Zhang
Xiusi Chen
Bowen Jin
Sheng Wang
Shuiwang Ji
Wei Wang
Jiawei Han
40
27
0
16 Jun 2024
StrucText-Eval: An Autogenerated Benchmark for Evaluating Large Language
  Model's Ability in Structure-Rich Text Understanding
StrucText-Eval: An Autogenerated Benchmark for Evaluating Large Language Model's Ability in Structure-Rich Text Understanding
Zhouhong Gu
Haoning Ye
Zeyang Zhou
Hongwei Feng
Yanghua Xiao
ELM
42
1
0
15 Jun 2024
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
TJ Dunham
Henry Syahputra
21
1
0
15 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
33
9
0
14 Jun 2024
Unpacking DPO and PPO: Disentangling Best Practices for Learning from
  Preference Feedback
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Hamish Ivison
Yizhong Wang
Jiacheng Liu
Zeqiu Wu
Valentina Pyatkin
Nathan Lambert
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
36
38
0
13 Jun 2024
StreamBench: Towards Benchmarking Continuous Improvement of Language
  Agents
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents
Cheng-Kuang Wu
Zhi Rui Tam
Chieh-Yen Lin
Yun-Nung Chen
Hung-yi Lee
LLMAG
19
6
0
13 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
44
10
0
12 Jun 2024
TextGrad: Automatic "Differentiation" via Text
TextGrad: Automatic "Differentiation" via Text
Mert Yuksekgonul
Federico Bianchi
Joseph Boen
Sheng Liu
Zhi Huang
Carlos Guestrin
James Zou
LLMAG
OOD
AI4CE
37
31
0
11 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More
  Effective and Efficient Linearized Large Language Models
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
18
1
0
11 Jun 2024
Teaching Language Models to Self-Improve by Learning from Language
  Feedback
Teaching Language Models to Self-Improve by Learning from Language Feedback
Chi Hu
Yimin Hu
Hang Cao
Tong Xiao
Jingbo Zhu
LRM
VLM
25
4
0
11 Jun 2024
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
37
1
0
08 Jun 2024
Previous
123...678...141516
Next