FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference BenchmarkingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025 |
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through CodeInternational Conference on Learning Representations (ICLR), 2025 |
Towards Effective Discrimination Testing for Generative AIConference on Fairness, Accountability and Transparency (FAccT), 2024 |
Benchmark Data Repositories for Better BenchmarkingNeural Information Processing Systems (NeurIPS), 2024 |
Real Risks of Fake Data: Synthetic Data, Diversity-Washing and Consent
CircumventionConference on Fairness, Accountability and Transparency (FAccT), 2024 |
Inherent Trade-Offs between Diversity and Stability in Multi-Task
BenchmarksInternational Conference on Machine Learning (ICML), 2024 |