ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.09668
  4. Cited By
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using
  LLMs

Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs

14 October 2023
Chenyang Yang
Rishabh Rustogi
Rachel A. Brower-Sinning
Grace A. Lewis
Christian Kastner
Tongshuang Wu
    KELM
ArXivPDFHTML

Papers citing "Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs"

16 / 16 papers shown
Title
SPHERE: An Evaluation Card for Human-AI Systems
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
46
0
0
24 Mar 2025
Orbit: A Framework for Designing and Evaluating Multi-objective Rankers
Orbit: A Framework for Designing and Evaluating Multi-objective Rankers
Chenyang Yang
Tesi Xiao
Michael Shavlovsky
Christian Kastner
Tongshuang Wu
27
0
0
07 Nov 2024
What Is Wrong with My Model? Identifying Systematic Problems with
  Semantic Data Slicing
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
Chenyang Yang
Yining Hong
Grace A. Lewis
Tongshuang Wu
Christian Kastner
36
1
0
14 Sep 2024
MaxMind: A Memory Loop Network to Enhance Software Productivity based on
  Large Language Models
MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models
Yuchen Dong
Xiaoxiang Fang
Yuchen Hu
Renshuang Jiang
Zhe Jiang
36
0
0
07 Aug 2024
Navigating LLM Ethics: Advancements, Challenges, and Future Directions
Navigating LLM Ethics: Advancements, Challenges, and Future Directions
Junfeng Jiao
S. Afroogh
Yiming Xu
Connor Phillips
AILaw
53
19
0
14 May 2024
Clarify: Improving Model Robustness With Natural Language Corrections
Clarify: Improving Model Robustness With Natural Language Corrections
Yoonho Lee
Michelle S. Lam
Helena Vasconcelos
Michael S. Bernstein
Chelsea Finn
10
6
0
06 Feb 2024
LLMs for Test Input Generation for Semantic Caches
LLMs for Test Input Generation for Semantic Caches
Zafaryab Rasool
Scott Barnett
David Willie
Stefanus Kurniawan
Sherwin Balugo
Srikanth Thudumu
Mohamed Abdelrazek
13
1
0
16 Jan 2024
(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for
  Evolving LLM APIs
(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs
Wanqin Ma
Chenyang Yang
Christian Kastner
11
7
0
18 Nov 2023
Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews
  Elicited from Large Language Models
Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models
Michael Xieyang Liu
Tongshuang Wu
Tianying Chen
Franklin Mingzhe Li
A. Kittur
Brad A. Myers
LRM
RALM
14
20
0
03 Oct 2023
Model Sketching: Centering Concepts in Early-Stage Machine Learning
  Model Design
Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design
Michelle S. Lam
Zixian Ma
Anne Li
Izequiel Freitas
Dakuo Wang
James A. Landay
Michael S. Bernstein
150
21
0
06 Mar 2023
Crawling the Internal Knowledge-Base of Language Models
Crawling the Internal Knowledge-Base of Language Models
Roi Cohen
Mor Geva
Jonathan Berant
Amir Globerson
170
74
0
30 Jan 2023
Language Generation Models Can Cause Harm: So What Can We Do About It?
  An Actionable Survey
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Sachin Kumar
Vidhisha Balachandran
Lucille Njoo
Antonios Anastasopoulos
Yulia Tsvetkov
ELM
58
59
0
14 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
NL-Augmenter: A Framework for Task-Sensitive Natural Language
  Augmentation
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole
Varun Gangal
Sebastian Gehrmann
Aadesh Gupta
Zhenhao Li
...
Tianbao Xie
Usama Yaseen
Michael A. Yee
Jing Zhang
Yue Zhang
153
86
0
06 Dec 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Mohit Bansal
Christopher Ré
AAML
OffRL
OOD
133
136
0
13 Jan 2021
Improving fairness in machine learning systems: What do industry
  practitioners need?
Improving fairness in machine learning systems: What do industry practitioners need?
Kenneth Holstein
Jennifer Wortman Vaughan
Hal Daumé
Miroslav Dudík
Hanna M. Wallach
FaML
HAI
181
730
0
13 Dec 2018
1