ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.10928
  4. Cited By
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill
  Sets

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

20 July 2023
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
    ALM
ArXivPDFHTML

Papers citing "FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets"

31 / 31 papers shown
Title
TRAIL: Trace Reasoning and Agentic Issue Localization
TRAIL: Trace Reasoning and Agentic Issue Localization
Darshan Deshpande
Varun Gangal
Hersh Mehta
Jitin Krishnan
Anand Kannappan
Rebecca Qian
16
0
0
13 May 2025
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
Jian-Yu Guan
J. Wu
J. Li
Chuanqi Cheng
Wei Yu Wu
LM&MA
69
0
0
21 Mar 2025
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Takyoung Kim
Kyungjae Lee
Y. Jang
Ji Yong Cho
Gangwoo Kim
Minseok Cho
Moontae Lee
83
0
0
28 Jan 2025
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
Robert Friel
Masha Belyi
Atindriyo Sanyal
72
17
0
17 Jan 2025
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Jianfei Zhang
Jun Bai
B. Li
Yanmeng Wang
Rumei Li
Chenghua Lin
Wenge Rong
39
0
0
31 Dec 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
108
61
0
25 Nov 2024
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Gabrielle Kaili-May Liu
Bowen Shi
Avi Caciularu
Idan Szpektor
Arman Cohan
58
3
0
30 Oct 2024
Better Instruction-Following Through Minimum Bayes Risk
Better Instruction-Following Through Minimum Bayes Risk
Ian Wu
Patrick Fernandes
Amanda Bertsch
Seungone Kim
Sina Pakazad
Graham Neubig
48
9
0
03 Oct 2024
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang
Dading Chong
Feng Jiang
Chengguang Tang
Anningzhe Gao
Guohua Tang
Haizhou Li
ALM
29
2
0
20 Sep 2024
Your Weak LLM is Secretly a Strong Teacher for Alignment
Your Weak LLM is Secretly a Strong Teacher for Alignment
Leitian Tao
Yixuan Li
84
5
0
13 Sep 2024
STORYSUMM: Evaluating Faithfulness in Story Summarization
STORYSUMM: Evaluating Faithfulness in Story Summarization
Melanie Subbiah
Faisal Ladhak
Akankshya Mishra
Griffin Adams
Lydia B. Chilton
Kathleen McKeown
34
4
0
09 Jul 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
90
29
0
09 Jun 2024
TS-Align: A Teacher-Student Collaborative Framework for Scalable
  Iterative Finetuning of Large Language Models
TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models
Chen Zhang
Chengguang Tang
Dading Chong
Ke Shi
Guohua Tang
Feng Jiang
Haizhou Li
27
4
0
30 May 2024
Demystifying Instruction Mixing for Fine-tuning Large Language Models
Demystifying Instruction Mixing for Fine-tuning Large Language Models
Renxi Wang
Haonan Li
Minghao Wu
Yuxia Wang
Xudong Han
Chiyu Zhang
Timothy Baldwin
19
0
0
17 Dec 2023
Routing to the Expert: Efficient Reward-guided Ensemble of Large
  Language Models
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Keming Lu
Hongyi Yuan
Runji Lin
Junyang Lin
Zheng Yuan
Chang Zhou
Jingren Zhou
MoE
LRM
32
52
0
15 Nov 2023
Can Large Language Models Capture Dissenting Human Voices?
Can Large Language Models Capture Dissenting Human Voices?
Noah Lee
Na Min An
James Thorne
ALM
34
30
0
23 May 2023
Aligning Large Language Models through Synthetic Feedback
Aligning Large Language Models through Synthetic Feedback
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALM
SyDa
73
67
0
23 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
206
559
0
03 May 2023
DIFFQG: Generating Questions to Summarize Factual Changes
DIFFQG: Generating Questions to Summarize Factual Changes
Jeremy R. Cole
Palak Jain
Julian Martin Eisenschlos
Michael J.Q. Zhang
Eunsol Choi
Bhuwan Dhingra
KELM
19
3
0
01 Mar 2023
Efficiently Enhancing Zero-Shot Performance of Instruction Following
  Model via Retrieval of Soft Prompt
Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt
Seonghyeon Ye
Joel Jang
Doyoung Kim
Yongrae Jo
Minjoon Seo
VLM
21
2
0
06 Oct 2022
INSCIT: Information-Seeking Conversations with Mixed-Initiative
  Interactions
INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions
Zeqiu Wu
Ryu Parish
Hao Cheng
Sewon Min
Prithviraj Ammanabrolu
Mari Ostendorf
Hannaneh Hajishirzi
65
14
0
02 Jul 2022
Fine-tuned Language Models are Continual Learners
Fine-tuned Language Models are Continual Learners
Thomas Scialom
Tuhin Chakrabarty
Smaranda Muresan
CLL
LRM
134
116
0
24 May 2022
Instruction Induction: From Few Examples to Natural Language Task
  Descriptions
Instruction Induction: From Few Examples to Natural Language Task Descriptions
Or Honovich
Uri Shaham
Samuel R. Bowman
Omer Levy
ELM
LRM
107
133
0
22 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information
Understanding Dataset Difficulty with V\mathcal{V}V-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
154
227
0
16 Oct 2021
BBQ: A Hand-Built Bias Benchmark for Question Answering
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
212
364
0
15 Oct 2021
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional
  Answers
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers
Haitian Sun
William W. Cohen
Ruslan Salakhutdinov
59
33
0
13 Oct 2021
ContractNLI: A Dataset for Document-level Natural Language Inference for
  Contracts
ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts
Yuta Koreeda
Christopher D. Manning
AILaw
87
96
0
05 Oct 2021
A Token-level Reference-free Hallucination Detection Benchmark for
  Free-form Text Generation
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
HILM
217
140
0
18 Apr 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
  Reasoning Strategies
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
245
671
0
06 Jan 2021
1