POTATO: The Portable Text Annotation Tool

16 December 2022

Jiaxin Pei

Aparna Ananthasubramaniam

Papers citing "POTATO: The Portable Text Annotation Tool"

44 / 44 papers shown

Title
From Precision to Perception: User-Centred Evaluation of Keyword Extraction Algorithms for Internet-Scale Contextual Advertising Jingwen Cai Sara Leckner Johanna Björklund 36 0 0 30 Apr 2025
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale Bowen Jiang Zhuoqun Hao Y. Cho B. Li Yuan Yuan Sihao Chen Lyle Ungar Camillo J. Taylor Dan Roth 37 0 0 19 Apr 2025
Modifying Large Language Model Post-Training for Diverse Creative Writing John Joon Young Chung Vishakh Padmakumar Melissa Roemmele Yuqian Sun Max Kreminski MoMe 46 0 0 21 Mar 2025
Have LLMs Made Active Learning Obsolete? Surveying the NLP Community Julia Romberg Christopher Schröder Julius Gonsior Katrin Tomanek Fredrik Olsson 62 0 0 12 Mar 2025
CULEMO: Cultural Lenses on Emotion -- Benchmarking LLMs for Cross-Cultural Emotion Understanding Tadesse Destaw Belay Ahmed Haj Ahmed Alvin Grissom II Iqra Ameer Grigori Sidorov Olga Kolesnikova Seid Muhie Yimam 41 0 0 12 Mar 2025
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments Hojae Han Seung-won Hwang Rajhans Samdani Yuxiong He ALM 65 2 0 27 Feb 2025
Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral Shivani Kumar David Jurgens LRM 41 0 0 21 Feb 2025
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages Shamsuddeen Hassan Muhammad N. Ousidhoum Idris Abdulmumin Jan Philip Wahle Terry Ruas ... Florian Valentin Wunderlich Hanif Muhammad Zhafran Tianhui Zhang Yi Zhou Saif M. Mohammad 33 3 0 17 Feb 2025
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks Jing Yang Max Glockner Anderson de Rezende Rocha Iryna Gurevych LRM 62 1 0 07 Feb 2025
Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols John Joon Young Chung Melissa Roemmele Max Kreminski VGen 67 0 0 23 Jan 2025
A Reality Check on Context Utilisation for Retrieval-Augmented Generation Lovisa Hagström Sara Vera Marjanović Haeun Yu Arnav Arora Christina Lioma Maria Maistro Pepa Atanasova Isabelle Augenstein 70 0 0 22 Dec 2024
Mitigating Trauma in Qualitative Research Infrastructure: Roles for Machine Assistance and Trauma-Informed Design Emily Tseng Thomas Ristenpart Nicola Dell 72 1 0 22 Dec 2024
Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs Lynn Greschner Roman Klinger 83 2 0 20 Dec 2024
MetaphorShare: A Dynamic Collaborative Repository of Open Metaphor Datasets Joanne Boisson Arif Mehmood Jose Camacho-Collados 66 0 0 27 Nov 2024
ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model Lifan Jiang Zhihui Wang Siqi Yin Guangxiao Ma Peng Zhang Boxi Wu DiffM 51 0 0 28 Aug 2024
The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI Miriam Schirmer Tobias Leemann Gjergji Kasneci Jürgen Pfeffer David Jurgens 80 0 0 12 Aug 2024
BotEval: Facilitating Interactive Human Evaluation Hyundong Justin Cho Thamme Gowda Yuyang Huang Zixun Lu Tianli Tong Jonathan May ALM 37 1 0 25 Jul 2024
Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions Xiang Li Haoran Tang Siyu Chen Ziwei Wang Ryan Chen Marcin Abram LRM 29 1 0 02 Jul 2024
AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries Irina Saparina Mirella Lapata 30 10 0 27 Jun 2024
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows Xingjian Zhang Yutong Xie Jin Huang Jinge Ma Zhaoying Pan ... Ziyang Xiong Tolga Ergen Dongsub Shim Honglak Lee Qiaozhu Mei 41 10 0 10 Jun 2024
Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions Soumyadeep Roy A. Khatua Fatemeh Ghoochani Uwe Hadler Wolfgang Nejdl Niloy Ganguly ELM LM&MA 33 8 0 20 Apr 2024
Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data Oana Ignat Gayathri Ganesh Lakshmy Rada Mihalcea DeLMO 19 1 0 19 Apr 2024
SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes Timothee Mickus Elaine Zosa Raúl Vázquez Teemu Vahtola Jörg Tiedemann Vincent Segonne Alessandro Raganato Marianna Apidianaki HILM LRM 21 20 0 12 Mar 2024
Understanding Fine-grained Distortions in Reports of Scientific Findings Amelie Wuhrl Dustin Wright Roman Klinger Isabelle Augenstein 25 3 0 19 Feb 2024
EEVEE: An Easy Annotation Tool for Natural Language Processing Axel Sorensen Siyao Peng Barbara Plank Rob van der Goot 18 1 0 05 Feb 2024
The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change Dominik Schlechtweg S. Virk Pauline Sander Emma Sköldberg Lukas Theuer Linke Tuo Zhang Nina Tahmasebi Jonas Kuhn Sabine Schulte im Walde 15 10 0 21 Nov 2023
Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks Andrea Sottana Bin Liang Kai Zou Zheng Yuan ALM ELM LM&MA 25 54 0 20 Oct 2023
Unsupervised Candidate Answer Extraction through Differentiable Masker-Reconstructor Model Zhuoer Wang Yicheng Wang Ziwei Zhu James Caverlee 21 0 0 19 Oct 2023
An Emulator for Fine-Tuning Large Language Models using Small Language Models Eric Mitchell Rafael Rafailov Archit Sharma Chelsea Finn Christopher D. Manning ALM 27 51 0 19 Oct 2023
Human Feedback is not Gold Standard Tom Hosking Phil Blunsom Max Bartolo ALM 14 48 0 28 Sep 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback Xingyao Wang Zihan Wang Jiateng Liu Yangyi Chen Lifan Yuan Hao Peng Heng Ji LRM 125 138 0 19 Sep 2023
On the Challenges of Building Datasets for Hate Speech Detection Vitthal Bhandari 6 1 0 06 Sep 2023
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation David Heineman Yao Dou Wei-ping Xu 22 7 0 14 Aug 2023
Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships David Jurgens Agrima Seth Jack E. Sargent Athena Aghighi Michael Geraci 9 7 0 06 Jul 2023
When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset Jiaxin Pei David Jurgens 12 31 0 12 Jun 2023
Chinese Open Instruction Generalist: A Preliminary Release Ge Zhang Yemin Shi Ruibo Liu Ruibin Yuan Yizhi Li ... Zhaoqun Li Zekun Wang Chenghua Lin Wen-Fen Huang Jie Fu ALM 17 28 0 17 Apr 2023
DMOps: Data Management Operation and Recipes E. Choi Chanjun Park 17 7 0 02 Jan 2023
Modeling Information Change in Science Communication with Semantically Matched Paraphrases Dustin Wright Jiaxin Pei David Jurgens Isabelle Augenstein 21 14 0 24 Oct 2022
SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis Jiaxin Pei Vítor Silva Maarten W. Bos Yozon Liu Leonardo Neves David Jurgens Francesco Barbieri 42 28 0 03 Oct 2022
Measuring Sentence-Level and Aspect-Level (Un)certainty in Science Communications Jiaxin Pei David Jurgens 23 28 0 30 Sep 2021
An animated picture says at least a thousand words: Selecting Gif-based Replies in Multimodal Dialog Xingyao Wang David Jurgens 17 5 0 24 Sep 2021
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing Boaz Shmueli Jan Fell Soumya Ray Lun-Wei Ku 100 86 0 20 Apr 2021
Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations Jiajun Bao J. Wu Yiming Zhang Eshwar Chandrasekharan David Jurgens 38 45 0 16 Feb 2021
A Survey on Bias and Fairness in Machine Learning Ninareh Mehrabi Fred Morstatter N. Saxena Kristina Lerman Aram Galstyan SyDa FaML 294 4,187 0 23 Aug 2019