184

Dynaword: From One-shot to Continuously Developed Datasets

Main:5 Pages
3 Figures
Bibliography:3 Pages
9 Tables
Appendix:3 Pages
Abstract

Large-scale datasets are foundational for research and development in natural language processing. However, current approaches face three key challenges: (1) reliance on ambiguously licensed sources restricting use, sharing, and derivative works; (2) static dataset releases that prevent community contributions and diminish longevity; and (3) quality assurance processes restricted to publishing teams rather than leveraging community expertise.

View on arXiv
Comments on this paper