Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2408.01122
Cited By
v1
v2 (latest)
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
2 August 2024
Leo Micklem
Yan-Bin Shen
Wenjing Luo
Yan Zhang
Hao Liang
H. Liang
Fan Yang
Mingan Lin
Yujing Qiao
Weipeng Chen
Bin Cui
Blair Thornton
Wentao Zhang
Guosheng Dong
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CFBench: A Comprehensive Constraints-Following Benchmark for LLMs"
50 / 56 papers shown
Title
ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions
Xingwei He
Qianru Zhang
Pengfei Chen
Guanhua Chen
Linlin Yu
Yuan Yuan
Siu-Ming Yiu
97
0
0
18 Nov 2025
GraphIF: Enhancing Multi-Turn Instruction Following for Large Language Models with Relation Graph Prompt
Z. Li
Can Lin
Ling Zheng
Wen-Da Wei
Junli Liang
Qi Song
168
0
0
13 Nov 2025
One Battle After Another: Probing LLMs' Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework
Qi Jia
Kaiwei Zhang
Xiujie Song
Ye Shen
Xiangyang Zhu
Guangtao Zhai
ALM
108
0
0
05 Nov 2025
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation
Bosi Wen
Y. Niu
C. Wang
Pei Ke
Xiaoying Ling
Y. Zhang
A. Zeng
Hongning Wang
Shiyu Huang
ALM
116
0
0
02 Nov 2025
IF-VidCap: Can Video Caption Models Follow Instructions?
S. Li
Y. Zhang
J. Wu
Zhide Lei
Yiwen He
...
Yingshui Tan
Y. Wang
Qianqian Xie
Zhaoxiang Zhang
Jiaheng Liu
VLM
81
2
0
21 Oct 2025
Higher Satisfaction, Lower Cost: A Technical Report on How LLMs Revolutionize Meituan's Intelligent Interaction Systems
Xuxin Cheng
Ke Zeng
Z. Cao
Linyi Dai
Wenxuan Gao
...
X. Wang
Bo Xiao
W. Yao
Qianlin Zhou
Benchang Zhu
70
0
0
15 Oct 2025
Format Inertia: A Failure Mechanism of LLMs in Medical Pre-Consultation
Seungseop Lim
Gibaeg Kim
Wooseok Han
Jean Seo
Hyunkyung Lee
Jaehyo Yoo
Eunho Yang
LM&MA
250
0
0
02 Oct 2025
When Instructions Multiply: Measuring and Estimating LLM Capabilities of Multiple Instructions Following
Keno Harada
Yudai Yamazaki
Masachika Taniguchi
Edison Marrese-Taylor
Takeshi Kojima
Yusuke Iwasawa
Yutaka Matsuo
ALM
114
0
0
25 Sep 2025
TRUEBench: Can LLM Response Meet Real-world Constraints as Productivity Assistant?
J. Park
Jongyoon Song
Minjin Choi
Kyuho Heo
Taehun Huh
Ji Won Kim
52
0
0
24 Sep 2025
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Baichuan-M2 Team
Chengfeng Dou
Chong Liu
Chenzheng Zhu
Fei Li
...
Zheng Liang
Zhishou Zhang
Hengfu Cui
Zuyi Zhu
X. Wang
LM&MA
ELM
LRM
104
13
0
02 Sep 2025
R-ConstraintBench: Evaluating LLMs on NP-Complete Scheduling
Raj Jain
Marc Wetter
49
1
0
21 Aug 2025
Prompt-Based One-Shot Exact Length-Controlled Generation with LLMs
Juncheng Xie
Hung-yi Lee
68
0
0
19 Aug 2025
Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following
Chenyang Wang
Liang Wen
Shousheng Jia
Xiangzheng Zhang
Liang Xu
LRM
86
1
0
05 Aug 2025
EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models
Tao Zou
Xinghua Zhang
Haiyang Yu
Minzheng Wang
Fei Huang
Yongbin Li
165
1
0
10 Jun 2025
Reverse Preference Optimization for Complex Instruction Following
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xiang Huang
Ting-En Lin
Feiteng Fang
Yuchuan Wu
Hangyu Li
Yuzhong Qu
Fei Huang
Yongbin Li
160
1
0
28 May 2025
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Dahua Lin
Jiaqi Wang
OffRL
446
17
0
10 Apr 2025
XIFBench: Evaluating Large Language Models on Multilingual Instruction Following
Zhiyu Li
Kehai Chen
Yunfei Long
X. Bai
Yaoyin Zhang
Xuchen Wei
Junlin Li
Min Zhang
ELM
161
2
0
10 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
267
3
0
09 Mar 2025
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation
K. Yan
Hongcheng Guo
Xuanqing Shi
Jinfeng Xu
Yaonan Gu
Hui Yuan
ALM
520
8
0
26 Feb 2025
VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
International Conference on Human Factors in Computing Systems (CHI), 2025
Christine P. Lee
David J. Porfirio
Xinyu Jessica Wang
Kevin Zhao
Bilge Mutlu
401
19
0
25 Feb 2025
LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Xiaodong Wu
Minhao Wang
Yichen Liu
Xiaoming Shi
He Yan
Xiangju Lu
Junmin Zhu
Wei Zhang
1.1K
10
0
11 Nov 2024
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Thomas Palmeira Ferraz
Kartik Mehta
Yu-Hsiang Lin
Haw-Shiuan Chang
Shereen Oraby
Sijia Liu
Vivek Subramanian
Tagyoung Chung
Mohit Bansal
Nanyun Peng
223
23
0
09 Oct 2024
CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints
Anirudh Atmakuru
Jatin Nainani
Rohith Siddhartha Reddy Bheemreddy
Anirudh Lakkaraju
Zonghai Yao
Hamed Zamani
Haw-Shiuan Chang
319
11
0
05 Oct 2024
MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark
Elliot L. Epstein
Kaisheng Yao
Jing Li
Xinyi Bai
Hamid Palangi
LRM
187
2
0
26 Sep 2024
SysBench: Can Large Language Models Follow System Messages?
Yanzhao Qin
Tao Zhang
Tao Zhang
Yanjun Shen
Wenjing Luo
...
Yujing Qiao
Weipeng Chen
Guosheng Dong
Wentao Zhang
Bin Cui
ALM
337
15
0
20 Aug 2024
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLM
VLM
MU
431
1,595
0
15 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq Joty
Jimmy Huang
ELM
ALM
237
82
0
04 Jul 2024
Benchmarking Complex Instruction-Following with Multiple Constraints Composition
Bosi Wen
Pei Ke
Xiaotao Gu
Lindong Wu
Hao Huang
...
Jiaxin Xu
Yiming Liu
Jie Tang
Hongning Wang
Minlie Huang
CoGe
267
88
0
04 Jul 2024
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach
Kun Sun
Rong Wang
Anders Sogaard
228
6
0
22 Mar 2024
FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability
Congying Xia
Chen Xing
Jiangshu Du
Xinyi Yang
Yihao Feng
Ran Xu
Wenpeng Yin
Caiming Xiong
ALM
241
70
0
28 Feb 2024
Can Large Language Models Understand Real-World Complex Instructions?
AAAI Conference on Artificial Intelligence (AAAI), 2023
Qi He
Jie Zeng
Wenhao Huang
Lina Chen
Jin Xiao
...
Shisong Chen
Yikai Zhang
Zhouhong Gu
Jiaqing Liang
Yanghua Xiao
ALM
LRM
ELM
261
85
0
17 Sep 2023
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
Xiangru Tang
Yiming Zong
Jason Phang
Yilun Zhao
Wangchunshu Zhou
Arman Cohan
Mark B. Gerstein
LMTD
ELM
ALM
211
16
0
16 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
5.2K
14,855
0
18 Jul 2023
COLLIE: Systematic Construction of Constrained Text Generation Tasks
International Conference on Learning Representations (ICLR), 2023
Shunyu Yao
Howard Chen
Austin W. Hanjie
Runzhe Yang
Karthik Narasimhan
218
51
0
17 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Neural Information Processing Systems (NeurIPS), 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
2.2K
6,226
0
09 Jun 2023
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Subhabrata Mukherjee
Arindam Mitra
Ganesh Jawahar
Sahaj Agarwal
Hamid Palangi
Ahmed Hassan Awadallah
ELM
ALM
LRM
370
335
0
05 Jun 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Neural Information Processing Systems (NeurIPS), 2023
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Abigail Z. Jacobs
Tatsunori B. Hashimoto
ALM
302
741
0
22 May 2023
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Neural Information Processing Systems (NeurIPS), 2023
Yuzhen Huang
Yuzhuo Bai
Zhihao Zhu
Junlei Zhang
Jinghan Zhang
...
Yikai Zhang
Jiayi Lei
Yao Fu
Maosong Sun
Junxian He
ELM
LRM
288
712
0
15 May 2023
Controlled Text Generation with Natural Language Instructions
International Conference on Machine Learning (ICML), 2023
Wangchunshu Zhou
Yuchen Eleanor Jiang
Ethan Gotlieb Wilcox
Robert Bamler
Mrinmaya Sachan
356
110
0
27 Apr 2023
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Wanjun Zhong
Ruixiang Cui
Yiduo Guo
Yaobo Liang
Shuai Lu
Yanlin Wang
Amin Saied
Weizhu Chen
Nan Duan
ALM
ELM
322
696
0
13 Apr 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
3.3K
20,007
0
15 Mar 2023
Foundation Models for Decision Making: Problems, Methods, and Opportunities
Sherry Yang
Ofir Nachum
Yilun Du
Jason W. Wei
Pieter Abbeel
Dale Schuurmans
LM&Ro
OffRL
LRM
AI4CE
328
205
0
07 Mar 2023
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Yizhong Wang
Yeganeh Kordi
Swaroop Mishra
Alisa Liu
Noah A. Smith
Daniel Khashabi
Hannaneh Hajishirzi
ALM
SyDa
LRM
633
2,746
0
20 Dec 2022
Controllable Text Generation with Language Constraints
Howard Chen
Huihan Li
Danqi Chen
Karthik Narasimhan
215
18
0
20 Dec 2022
PaLM: Scaling Language Modeling with Pathways
Journal of machine learning research (JMLR), 2022
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
1.1K
7,275
0
05 Apr 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
940
6,512
0
27 Oct 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
588
1,878
0
15 Oct 2021
Style Control for Schema-Guided Natural Language Generation
Alicia Y. Tsai
Shereen Oraby
Vittorio Perera
Jiun-Yu Kao
Yuheng Du
Anjali Narayan-Chen
Tagyoung Chung
Dilek Z. Hakkani-Tür
211
12
0
24 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
1.0K
4,506
0
03 Sep 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
1.0K
7,406
0
07 Jul 2021
1
2
Next