Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.08188
Cited By
Position: Measure Dataset Diversity, Don't Just Claim It
11 July 2024
Dora Zhao
Jerone T. A. Andrews
Orestis Papakyriakopoulos
Alice Xiang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Position: Measure Dataset Diversity, Don't Just Claim It"
13 / 13 papers shown
Title
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
64
11
0
31 Dec 2024
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu
Shitong Shao
Bao Li
Lichen Bai
Zhiqiang Xu
Haoyi Xiong
James Kwok
Sumi Helal
Zeke Xie
24
11
0
11 Sep 2024
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Chantal Shaib
Joe Barrow
Jiuding Sun
Alexa F. Siu
Byron C. Wallace
A. Nenkova
58
31
0
01 Mar 2024
The Vendi Score: A Diversity Evaluation Metric for Machine Learning
Dan Friedman
Adji Bousso Dieng
EGVM
70
107
0
05 Oct 2022
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
86
25
0
13 May 2022
Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Kenny Peng
Arunesh Mathur
Arvind Narayanan
94
92
0
06 Aug 2021
Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset
Scott Ettinger
Shuyang Cheng
Benjamin Caine
Chenxi Liu
Hang Zhao
...
Jiquan Ngiam
Vijay Vasudevan
Alexander McCauley
Jonathon Shlens
Drago Anguelov
123
421
0
20 Apr 2021
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing
Boaz Shmueli
Jan Fell
Soumya Ray
Lun-Wei Ku
100
71
0
20 Apr 2021
One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision
Zaid Khan
Y. Fu
125
51
0
03 Feb 2021
Adding Chit-Chat to Enhance Task-Oriented Dialogues
Kai Sun
Seungwhan Moon
Paul A. Crook
Stephen Roller
Becka Silvert
Bing-Quan Liu
Zhiguang Wang
Honglei Liu
Eunjoon Cho
Claire Cardie
50
66
0
24 Oct 2020
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
100
268
0
24 Jan 2020
FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images
Christiane Zimmermann
Duygu Ceylan
Jimei Yang
Bryan C. Russell
Max Argus
Thomas Brox
3DH
176
394
0
10 Sep 2019
TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild
Matthias Muller
Adel Bibi
Silvio Giancola
Salman Al-Subaihi
Bernard Ghanem
192
785
0
28 Mar 2018
1