Cross-replication Reliability -- An Empirical Approach to Interpreting
Inter-rater Reliability

Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability

11 June 2021

Praveen K. Paritosh

Lora Aroyo

ArXiv (abs)PDF HTML

Papers citing "Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability"

16 / 16 papers shown

Title
Automating eHMI Action Design with LLMs for Automated Vehicle Communication Ding Xia Xinyue Gui Fan Gao Dongyuan Li Mark Colley Takeo Igarashi 22 0 0 27 May 2025
Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation Cheng Charles Ma Kevin Hyekang Joo Alexandria K. Vail Sunreeta Bhattacharya Álvaro Fernández García Kailana Baker-Matsuoka Sheryl Mathew Lori L. Holt Fernando De la Torre 72 4 0 13 Sep 2024
Rater Cohesion and Quality from a Vicarious Perspective Deepak Pandita Tharindu Cyril Weerasooriya Sujan Dutta Sarah K. K. Luger Tharindu Ranasinghe Ashiqur R. KhudaBukhsh Marcos Zampieri Christopher M. Homan 61 1 0 15 Aug 2024
Localizing and Mitigating Errors in Long-form Question Answering Rachneet Sachdeva Yixiao Song Mohit Iyyer Iryna Gurevych HILM 78 1 0 16 Jul 2024
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models Jaewoo Ahn Taehyun Lee Junyoung Lim Jin-Hwa Kim Sangdoo Yun Hwaran Lee Gunhee Kim LLMAG HILM 88 13 0 28 May 2024
D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation Aida Mostafazadeh Davani Mark Díaz Dylan K. Baker Vinodkumar Prabhakaran 74 10 0 16 Apr 2024
Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback Jiashuo Wang Chunpu Xu Chak Tou Leong Wenjie Li Jing Li 105 2 0 11 Jan 2024
Evolving Domain Adaptation of Pretrained Language Models for Text Classification Yun-Shiuan Chuang Yi Wu Dhruv Gupta Rheeya Uppaal Ananya Kumar Luhang Sun Makesh Narsimhan Sreedhar Sijia Yang Timothy T. Rogers Junjie Hu VLM 115 4 0 16 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives Vinodkumar Prabhakaran Christopher Homan Lora Aroyo Aida Mostafazadeh Davani Alicia Parrish Alex S. Taylor Mark Díaz Ding Wang Greg Serapio-García 99 9 0 09 Nov 2023
Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities Senjuti Dutta Sid Mittal Sherol Chen Deepak Ramachandran Ravi Rajakumar Ian D Kivlichan Sunny Mak Alena Butryna Praveen Paritosh University of Tennessee 109 7 0 01 Nov 2023
How We Define Harm Impacts Data Annotations: Explaining How Annotators Distinguish Hateful, Offensive, and Toxic Comments Angela M. Schöpke-Gonzalez Siqi Wu Sagar Kumar Paul Resnick Libby Hemphill 41 2 0 12 Sep 2023
Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection Oana Inel Tim Draws Lora Aroyo 110 6 0 22 Aug 2023
Student's t-Distribution: On Measuring the Inter-Rater Reliability When the Observations are Scarce Serge Gladkoff Lifeng Han Goran Nenadic 77 5 0 08 Mar 2023
Undesirable Biases in NLP: Addressing Challenges of Measurement Oskar van der Wal Dominik Bachmann Alina Leidinger L. Maanen Willem H. Zuidema K. Schulz 84 7 0 24 Nov 2022
Measuring and Improving Semantic Diversity of Dialogue Generation Seungju Han Beomsu Kim Buru Chang 85 15 0 11 Oct 2022
k-Rater Reliability: The Correct Unit of Reliability for Aggregated Human Annotations KayYen Wong Praveen K. Paritosh HILM 44 7 0 24 Mar 2022