Dynaword: From One-shot to Continuously Developed Datasets
Main:5 Pages
3 Figures
Bibliography:3 Pages
9 Tables
Appendix:3 Pages
Abstract
Large-scale datasets are foundational for research and development in natural language processing. However, current approaches face three key challenges: (1) reliance on ambiguously licensed sources restricting use, sharing, and derivative works; (2) static dataset releases that prevent community contributions and diminish longevity; and (3) quality assurance processes restricted to publishing teams rather than leveraging community expertise.
View on arXivComments on this paper
