ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.10558
  4. Cited By
Instruction-following Evaluation through Verbalizer Manipulation

Instruction-following Evaluation through Verbalizer Manipulation

20 July 2023
Shiyang Li
Jun Yan
Hai Wang
Zheng Tang
Xiang Ren
Vijay Srinivasan
Hongxia Jin
ArXivPDFHTML

Papers citing "Instruction-following Evaluation through Verbalizer Manipulation"

29 / 29 papers shown
Title
Spoken Language Understanding on Unseen Tasks With In-Context Learning
Spoken Language Understanding on Unseen Tasks With In-Context Learning
Neeraj Agrawal
Sriram Ganapathy
11
0
0
12 May 2025
LLMs Get Lost In Multi-Turn Conversation
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
34
0
0
09 May 2025
Smaller Large Language Models Can Do Moral Self-Correction
Smaller Large Language Models Can Do Moral Self-Correction
Guangliang Liu
Zhiyu Xue
Rongrong Wang
K. Johnson
Kristen Marie Johnson
LRM
23
0
0
30 Oct 2024
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks
Rudra Murthy
Prince Kumar
Praveen Venkateswaran
Danish Contractor
KELM
ALM
ELM
26
1
0
16 Oct 2024
RNR: Teaching Large Language Models to Follow Roles and Rules
RNR: Teaching Large Language Models to Follow Roles and Rules
Kuan-Chieh Jackson Wang
Alexander Bukharin
Haoming Jiang
Qingyu Yin
Zhengyang Wang
...
Chao Zhang
Bing Yin
Xian Li
Jianshu Chen
Shiyang Li
ALM
26
1
0
10 Sep 2024
SysBench: Can Large Language Models Follow System Messages?
SysBench: Can Large Language Models Follow System Messages?
Yanzhao Qin
Tao Zhang
Tao Zhang
Yanjun Shen
Wenjing Luo
...
Yujing Qiao
Weipeng Chen
Zenan Zhou
Wentao Zhang
Bin Cui
ALM
70
7
0
20 Aug 2024
On LLM Wizards: Identifying Large Language Models' Behaviors for Wizard
  of Oz Experiments
On LLM Wizards: Identifying Large Language Models' Behaviors for Wizard of Oz Experiments
Jingchao Fang
Nikos Aréchiga
Keiichi Namaoshi
N. Bravo
Candice L Hogan
David A. Shamma
30
3
0
10 Jul 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and
  Latent Concept
On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
Guangliang Liu
Haitao Mao
Bochuan Cao
Zhiyu Xue
K. Johnson
Jiliang Tang
Rongrong Wang
LRM
32
9
0
04 Jun 2024
From Complex to Simple: Enhancing Multi-Constraint Complex Instruction
  Following Ability of Large Language Models
From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models
Qi He
Jie Zeng
Qianxi He
Jiaqing Liang
Yanghua Xiao
32
9
0
24 Apr 2024
Fine-Tuning Language Models with Reward Learning on Policy
Fine-Tuning Language Models with Reward Learning on Policy
Hao Lang
Fei Huang
Yongbin Li
ALM
32
5
0
28 Mar 2024
MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task
  Planning with Open-Source Large Language Model
MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model
Yike Wu
Jiatao Zhang
Nan Hu
LanLing Tang
Guilin Qi
Jun Shao
Jie Ren
Wei Song
57
10
0
27 Mar 2024
Improving the Robustness of Large Language Models via Consistency
  Alignment
Improving the Robustness of Large Language Models via Consistency Alignment
Zhao Yukun
Lingyong Yan
Weiwei Sun
Guoliang Xing
Shuaiqiang Wang
Meng Chong
Zhicong Cheng
Zhaochun Ren
Yin Dawei
33
18
0
21 Mar 2024
RefuteBench: Evaluating Refuting Instruction-Following for Large
  Language Models
RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models
Jianhao Yan
Yun Luo
Yue Zhang
ALM
LRM
33
6
0
21 Feb 2024
Measuring and Controlling Instruction (In)Stability in Language Model
  Dialogs
Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Kenneth Li
Tianle Liu
Naomi Bashkansky
David Bau
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
16
6
0
13 Feb 2024
InFoBench: Evaluating Instruction Following Ability in Large Language
  Models
InFoBench: Evaluating Instruction Following Ability in Large Language Models
Yiwei Qin
Kaiqiang Song
Yebowen Hu
Wenlin Yao
Sangwoo Cho
Xiaoyang Wang
Xuansheng Wu
Fei Liu
Pengfei Liu
Dong Yu
ELM
18
35
0
07 Jan 2024
Instructive Decoding: Instruction-Tuned Large Language Models are
  Self-Refiner from Noisy Instructions
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions
Taehyeon Kim
Joonkee Kim
Gihun Lee
Se-Young Yun
22
11
0
01 Nov 2023
Evaluating Large Language Models at Evaluating Instruction Following
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELM
ALM
28
166
0
11 Oct 2023
PACIT: Unlocking the Power of Examples for Better In-Context Instruction
  Tuning
PACIT: Unlocking the Power of Examples for Better In-Context Instruction Tuning
Tianci Xue
Ziqi Wang
Yixia Li
Yun-Nung Chen
Guanhua Chen
18
2
0
02 Oct 2023
Evaluating Instruction-Tuned Large Language Models on Code Comprehension
  and Generation
Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation
Zhiqiang Yuan
Junwei Liu
Qiancheng Zi
Mingwei Liu
Xin Peng
Yiling Lou
ALM
ELM
LRM
17
72
0
02 Aug 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
236
2,232
0
22 Mar 2023
Can Large Language Models Truly Understand Prompts? A Case Study with
  Negated Prompts
Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts
Joel Jang
Seonghyeon Ye
Minjoon Seo
ELM
LRM
87
64
0
26 Sep 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
4,048
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
208
1,654
0
15 Oct 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,584
0
21 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1