ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.07393
  4. Cited By
Cross-replication Reliability -- An Empirical Approach to Interpreting
  Inter-rater Reliability

Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability

11 June 2021
KayYen Wong
Praveen K. Paritosh
Lora Aroyo
ArXiv (abs)PDFHTML

Papers citing "Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability"

16 / 16 papers shown
Title
Automating eHMI Action Design with LLMs for Automated Vehicle Communication
Automating eHMI Action Design with LLMs for Automated Vehicle Communication
Ding Xia
Xinyue Gui
Fan Gao
Dongyuan Li
Mark Colley
Takeo Igarashi
22
0
0
27 May 2025
Multimodal Fusion with LLMs for Engagement Prediction in Natural
  Conversation
Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation
Cheng Charles Ma
Kevin Hyekang Joo
Alexandria K. Vail
Sunreeta Bhattacharya
Álvaro Fernández García
Kailana Baker-Matsuoka
Sheryl Mathew
Lori L. Holt
Fernando De la Torre
72
4
0
13 Sep 2024
Rater Cohesion and Quality from a Vicarious Perspective
Rater Cohesion and Quality from a Vicarious Perspective
Deepak Pandita
Tharindu Cyril Weerasooriya
Sujan Dutta
Sarah K. K. Luger
Tharindu Ranasinghe
Ashiqur R. KhudaBukhsh
Marcos Zampieri
Christopher M. Homan
61
1
0
15 Aug 2024
Localizing and Mitigating Errors in Long-form Question Answering
Localizing and Mitigating Errors in Long-form Question Answering
Rachneet Sachdeva
Yixiao Song
Mohit Iyyer
Iryna Gurevych
HILM
78
1
0
16 Jul 2024
TimeChara: Evaluating Point-in-Time Character Hallucination of
  Role-Playing Large Language Models
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models
Jaewoo Ahn
Taehyun Lee
Junyoung Lim
Jin-Hwa Kim
Sangdoo Yun
Hwaran Lee
Gunhee Kim
LLMAGHILM
88
13
0
28 May 2024
D3CODE: Disentangling Disagreements in Data across Cultures on
  Offensiveness Detection and Evaluation
D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation
Aida Mostafazadeh Davani
Mark Díaz
Dylan K. Baker
Vinodkumar Prabhakaran
74
10
0
16 Apr 2024
Mitigating Unhelpfulness in Emotional Support Conversations with
  Multifaceted AI Feedback
Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback
Jiashuo Wang
Chunpu Xu
Chak Tou Leong
Wenjie Li
Jing Li
105
2
0
11 Jan 2024
Evolving Domain Adaptation of Pretrained Language Models for Text
  Classification
Evolving Domain Adaptation of Pretrained Language Models for Text Classification
Yun-Shiuan Chuang
Yi Wu
Dhruv Gupta
Rheeya Uppaal
Ananya Kumar
Luhang Sun
Makesh Narsimhan Sreedhar
Sijia Yang
Timothy T. Rogers
Junjie Hu
VLM
115
4
0
16 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in
  Perspectives
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
99
9
0
09 Nov 2023
Modeling subjectivity (by Mimicking Annotator Annotation) in toxic
  comment identification across diverse communities
Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities
Senjuti Dutta
Sid Mittal
Sherol Chen
Deepak Ramachandran
Ravi Rajakumar
Ian D Kivlichan
Sunny Mak
Alena Butryna
Praveen Paritosh University of Tennessee
109
7
0
01 Nov 2023
How We Define Harm Impacts Data Annotations: Explaining How Annotators
  Distinguish Hateful, Offensive, and Toxic Comments
How We Define Harm Impacts Data Annotations: Explaining How Annotators Distinguish Hateful, Offensive, and Toxic Comments
Angela M. Schöpke-Gonzalez
Siqi Wu
Sagar Kumar
Paul Resnick
Libby Hemphill
41
2
0
12 Sep 2023
Collect, Measure, Repeat: Reliability Factors for Responsible AI Data
  Collection
Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection
Oana Inel
Tim Draws
Lora Aroyo
110
6
0
22 Aug 2023
Student's t-Distribution: On Measuring the Inter-Rater Reliability When
  the Observations are Scarce
Student's t-Distribution: On Measuring the Inter-Rater Reliability When the Observations are Scarce
Serge Gladkoff
Lifeng Han
Goran Nenadic
77
5
0
08 Mar 2023
Undesirable Biases in NLP: Addressing Challenges of Measurement
Undesirable Biases in NLP: Addressing Challenges of Measurement
Oskar van der Wal
Dominik Bachmann
Alina Leidinger
L. Maanen
Willem H. Zuidema
K. Schulz
84
7
0
24 Nov 2022
Measuring and Improving Semantic Diversity of Dialogue Generation
Measuring and Improving Semantic Diversity of Dialogue Generation
Seungju Han
Beomsu Kim
Buru Chang
85
15
0
11 Oct 2022
k-Rater Reliability: The Correct Unit of Reliability for Aggregated
  Human Annotations
k-Rater Reliability: The Correct Unit of Reliability for Aggregated Human Annotations
KayYen Wong
Praveen K. Paritosh
HILM
44
7
0
24 Mar 2022
1